Method and apparatus for efficient management of XML documents

ABSTRACT

An in-memory storage manager represents XML-compliant documents as a collection of objects in memory. The collection of objects allows the storage manager to manipulate the document, or parts of the document with a consistent interface and to provide for features that are not available in conventional XML documents, such as element attributes with types other than text and documents that contain binary rather than text information. In addition, in the storage manager, the XML-compliant document is associated with a schema document which defines the arrangement of the document elements and attributes. The schema data associated with a document can contain a mapping between document elements and program code to be associated with each element. The storage manager further has methods for retrieving the code from the element tag. The retrieved code can then be invoked using attributes and content from the associated element and the element then acts like a conventional object. Further, the storage manager allows real-time access by separate process operating in different contexts. The objects that are used to represent the document are constructed from common code found locally in each process. In addition, the data in the objects is also stored in memory local to each process. The local memories are synchronized by means of a distributed memory system that continually equates the data copies of the same element in different processes. Client-specified collections are managed by a separate collection manager. The collection manager maintains a data structure called a “waffle” that represents the XML data structures in tabular form. A record set engine that is driven by user commands propagates a set of updates for a collection to the collection manager. Based on those updates, the collection manager updates index structures and may notify waffle users via the notification system.

RELATED APPLICATIONS

This application is a division of U.S. patent application Ser. No.09/588,195 entitled Method and Apparatus for Efficient Management of XMLDocuments filed Jun. 6, 2002 by Raymond E. Ozzie, Kenneth G. Moore,Ransom L. Richardson and Edward J. Fischer (Atty. Docket No. G0008/7003)

FIELD OF THE INVENTION

This invention relates to storage and retrieval of information and, inparticular, to storage and retrieval of information encoded in ExtendedMarkup Language (XML).

BACKGROUND OF THE INVENTION

Modern computing systems are capable of storing, retrieving and managinglarge amounts of data. However, while computers are fast and efficientat handling numeric data they are less efficient at manipulating textdata and are especially poor at interpreting human-readable text data.Generally, present day computers are unable to understand subtle contextinformation that is necessary to understand and recognize pieces ofinformation that comprise a human-readable text document. Consequently,although they can detect predefined text orderings or pieces, such aswords in an undifferentiated text document, they cannot easily locate aparticular piece of information where the word or words defining theinformation have specific meanings. For example, human readers have nodifficulty in differentiating the word “will” in the sentence “Theattorney will read the text of Mark's will.”, but a computer may havegreat difficulty in distinguishing the two uses and locating only thesecond such use.

Therefore, schemes have been developed in order to assist a computer ininterpreting text documents by appropriately coding the document. Manyof these schemes identify selected portions of a text document by addinginto the document information, called “markup tags”, whichdifferentiates different document parts in such a way that a computercan reliably recognize the information. Such schemes are generallycalled “markup” languages.

One of these languages is called SGML (Standard Generalized MarkupLanguage) and is an internationally agreed upon standard for informationrepresentation. This language standard grew out of development work ongeneric coding and mark-up languages, which was carried out in the early1970s. Various lines of research merged into a subcommittee of theInternational Standards Organization called the subcommittee on TextDescription and Processing Languages. This subcommittee produced theSGML standard in 1986.

SGML itself is not a mark-up language in that it does not define mark-uptags nor does it provide a markup template for a particular type ofdocument. Instead, SGML denotes a way of describing and developinggeneralized descriptive markup schemes. These schemes are generalizedbecause the markup is not oriented towards a specific application anddescriptive because the markup describes what the text represents,instead of how it should be displayed. SGML is very flexible in thatmarkup schemes written in conformance with the standard allow users todefine their own formats for documents, and to handle large and complexdocuments, and to manage large information repositories.

Recently, another development has changed the general situation. Theextraordinary growth of the Internet, and particularly, the World WideWeb, has been driven by the ability it gives authors, or contentproviders, to easily and cheaply distribute electronic documents to aninternational audience. SGML contains many optional features that arenot needed for Web-based applications and has proven to have acost/benefit ratio unattractive to current vendors of Web browsers.Consequently, it is not generally used. Instead, most documents on theWeb are stored and transmitted in a markup language called the HypertextMarkup Language or HTML.

HTML is a simple markup language based on SGML and it is well suited forhypertext, multimedia, and the display of small and reasonably simpledocuments that are commonly transmitted on the Web. It uses a small,fixed set of markup tags to describe document portions. The small numberof fixed tags simplifies document construction and makes it much easierto build applications. However, since the tags are fixed, HTML is notextensible and has very limited structure and validation capabilities.As electronic Web documents have become larger and more complex, it hasbecome increasingly clear that HTML does not have the capabilitiesneeded for large-scale commercial publishing.

In order to address the requirements of such large-scale commercialpublishing and to enable the newly emerging technology of distributeddocument processing, an industry group called the World Wide WebConsortium has developed another markup language called the ExtensibleMarkup Language (XML) for applications that require capabilities beyondthose provided by HTML. Like HTML, XML is a simplified subset of SGMLspecially designed for Web applications and is easier to learn, use, andimplement than full SGML. Unlike HTML, XML retains SGML advantages ofextensibility, structure, and validation, but XML restricts the use ofSGML constructs to ensure that defaults are available when access tocertain components of the document is not currently possible over theInternet. XML also defines how Internet Uniform Resource Locators can beused to identify component parts of XML documents.

An XML document is composed of a series of entities or objects. Eachentity can contain one or more logical elements and each element canhave certain attributes or properties that describe the way in which itis to be processed. XML provides a formal syntax for describing therelationships between the entities, elements and attributes that make upan XML document. This syntax tells the computer how to recognize thecomponent parts of each document.

XML uses paired markup tags to identify document components. Inparticular, the start and end of each logical element is clearlyidentified by entry of a start-tag before the element and an end-tagafter the element. For example, the tags <to> and </to> could be used toidentify the “recipient” element of a document in the following manner:

-   -   document text . . . <to>Recipient</to> . . . document text.

The form and composition of markup tags can be defined by users, but areoften defined by a trade association or similar body in order to provideinteroperability between users. In order to operate with a predefinedset of tags, users need to know how the markup tags are delimited fromnormal text and the relationship between the various elements. Forexample, in XML systems, elements and their attributes are enteredbetween matched pairs of angle brackets (< . . . >), while entityreferences start with an ampersand and end with a semicolon (& . . . ;).Because XML tag sets are based on the logical structure of the document,they are easy to read and understand.

Since different documents have different parts or components, it is notpractical to predefine tags for all elements of all documents. Instead,documents can be classified into “types” which have certain elements. Adocument type definition (DTD) indicates which elements to expect in adocument type and indicates whether each element found in the documentis not allowed, allowed and required or allowed, but not required. Bydefining the role of each document element in a DTD, it is possible tocheck that each element occurs in a valid place within the document. Forexample, an XML DTD allows a check to be made that a third-level headingis not entered without the existence of a second-level heading. Such ahierarchical check cannot be made with HTML. The DTD for a document istypically inserted into the document header and each element is markedwith an identifier such as <!ELEMENT>.

However, unlike SGML, XML does not require the presence of a DTD. If noDTD is available for a document, either because all or part of the DTDis not accessible over the Internet or because the document authorfailed to create the DTD, an XML system can assign a default definitionfor undeclared elements in the document.

XML provides a coding scheme that is flexible enough to describe nearlyany logical text structure, such as letters, reports, memos, databasesor dictionaries. However, XML does not specify how an XML-compliant datastructure is to be stored and displayed, much less efficiently storedand displayed. Consequently, there is a need for a storage mechanismthat can efficiently manipulate and store XML-compliant documents.

SUMMARY OF THE INVENTION

In accordance with one embodiment of the invention, an in-memory storagemanager represents XML-compliant documents as a collection of objects inmemory. The collection of objects allows the storage manager tomanipulate the document, or parts of the document with a consistentinterface and to provide for features that are not available inconventional XML documents, such as element attributes with types otherthan text and documents that contain binary, rather than text,information. In addition, in the storage manager, the XML-compliantdocument is associated with a schema document (which is also an XMLdocument) that defines the arrangement of the document elements andattributes. The storage manager can operate with conventional storageservices to persist the XML-compliant document. Storage containerscontain pieces of the document that can be quickly located by thestorage manager.

In accordance with another embodiment, the storage manager also haspredefined methods that allow it to access and manipulate elements andattributes of the document content in a consistent manner. For example,the schema data can be accessed and manipulated with the same methodsused to access and manipulate the document content.

In accordance with yet another embodiment, the schema data associatedwith a document can contain a mapping between document elements andprogram code to be associated with each element. The storage managerfurther has methods for retrieving the code from the element tag. Theretrieved code can then be invoked using attributes and content from theassociated element and the element then acts like a conventional object.

In all embodiments, the storage manager provides dynamic, real-time dataaccess to clients by multiple processes in multiple contexts.Synchronization among multiple processes accessing the same document iscoordinated with event-driven queues and locks. The objects that areused to represent the document are constructed from common code foundlocally in each process. In addition, the data in the objects is alsostored in memory local to each process. The local memories aresynchronized by means of a distributed memory system that continuallyequates the data copies of the same element in different processes.

In still another embodiment, client-specified collections are managed bya separate collection manager. The collection manager maintains a datastructure called a “waffle” that represents the XML data structures intabular form. A record set engine that is driven by user commandspropagates a set of updates for a collection to the collection manager.Based on those updates, the collection manager updates index structuresand may notify waffle users via the notification system. The waffle usermay also navigate within the collection using cursors.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and further advantages of the invention may be betterunderstood by referring to the following description in conjunction withthe accompanying drawings in which:

FIG. 1 is a schematic diagram of a computer system on which theinventive storage manager system can run.

FIG. 2 is a block schematic diagram illustrating the relationship of thein-memory storage manager and persistent storage.

FIG. 3 is a block schematic diagram illustrating the representation ofan XML document on the storage manager memory as a collection ofobjects.

FIG. 4A is a block schematic diagram illustrating the componentsinvolved in binding code to XML elements.

FIG. 4B is a flowchart showing the steps involved in retrieving programcode bound to an element.

FIG. 5 illustrates the relationship of XML text documents and binarysub-documents.

FIG. 6 is a block schematic diagram illustrating the major internalparts of the storage manager in different processes.

FIG. 7 illustrates the mechanism for synchronizing objects acrossprocesses.

FIG. 8 is an illustration that shows the major control paths from thestorage manager APIs through the major internal parts of the storagemanager.

FIG. 9 is an illustration of the storage manager interface constructedin accordance with an object-oriented implementation of the invention.

FIG. 10 is an illustration of the interfaces constructed in accordancewith an object-oriented implementation of the invention, that aredefined by the storage manager and may be called during the processingof links or element RPCs.

FIG. 11 is an illustration of the database and transaction interfacesconstructed in accordance with an object-oriented implementation of theinvention.

FIG. 12 is an illustration of the document and element interfacesconstructed in accordance with an object-oriented implementation of theinvention.

FIG. 13 is an illustration of the element communication andsynchronization interfaces constructed in accordance with anobject-oriented implementation of the invention.

FIG. 14 is an illustration that shows the major control paths from thecollection manager APIs through the major internal parts of thecollection and storage managers.

FIG. 15 is an illustration of the collection manager interfacesconstructed in accordance with an object-oriented implementation of theinvention.

DETAILED DESCRIPTION

FIG. 1 illustrates the system architecture for an exemplary clientcomputer 100, such as an IBM THINKPAD 600®, on which the discloseddocument management system can be implemented. The exemplary computersystem of FIG. 1 is discussed only for descriptive purposes, however,and should not be considered a limitation of the invention. Although thedescription below may refer to terms commonly used in describingparticular computer systems, the described concepts apply equally toother computer systems, including systems having architectures that aredissimilar to that shown in FIG. 1 and also to devices with computers inthem, such as game consoles or cable TV set-top boxes, which may nottraditionally be thought of as computers.

The client computer 100 includes a central processing unit (CPU) 105,which may include a conventional microprocessor, random access memory(RAM) 110 for temporary storage of information, and read only memory(ROM) 115 for permanent storage of information. A memory controller 120is provided for controlling system RAM 110. A bus controller 125 isprovided for controlling bus 130, and an interrupt controller 135 isused for receiving and processing various interrupt signals from theother system components.

Mass storage may be provided by diskette 142, CD-ROM 147, or hard disk152. Data and software may be exchanged with client computer 100 viaremovable media, such as diskette 142 and CD-ROM 147. Diskette 142 isinsertable into diskette drive 141, which is connected to bus 130 bycontroller 140. Similarly, CD-ROM 147 can be inserted into CD-ROM drive146, which is connected to bus 130 by controller 145. Finally, the harddisk 152 is part of a fixed disk drive 151, which is connected to bus130 by controller 150.

User input to the client computer 100 may be provided by a number ofdevices. For example, a keyboard 156 and a mouse 157 may be connected tobus 130 by keyboard and mouse controller 155. An audio transducer 196,which may act as both a microphone and a speaker, is connected to bus130 by audio controller 197. It should be obvious to those reasonablyskilled in the art that other input devices, such as a pen and/or tabletand a microphone for voice input, may be connected to client computer100 through bus 130 and an appropriate controller. DMA controller 160 isprovided for performing direct memory access to system RAM 110. A visualdisplay is generated by a video controller 165, which controls videodisplay 170.

Client computer 100 also includes a network adapter 190 that allows theclient computer 100 to be interconnected to a network 195 via a bus 191.The network 195, which may be a local area network (LAN), a wide areanetwork (WAN), or the Internet, may utilize general-purposecommunication lines that interconnect multiple network devices.

Client computer system 100 generally is controlled and coordinated byoperating system software, such as the WINDOWS NT™ operating system(available from Microsoft Corp., Redmond, Wash.). Among other computersystem control functions, the operating system controls allocation ofsystem resources and performs tasks such as process scheduling, memorymanagement, networking and I/O services.

As illustrated in more detail in FIG. 2, the storage manager 206 residesin RAM 200 (equivalent to RAM 110 in FIG. 1) and provides an interfacebetween an application program 202 which uses XML documents 228 and 230and the persistent storage 208 in which the documents 228 and 230 arestored. The application 202 can interact with storage manager 206 bymeans of a consistent application programming interface 204 irregardlessof the type of persistent storage 208 used to store the objects.Internally, the storage manager 206 represents each document 210, 218,as a hierarchical series of objects 212-216 and 220-224, respectively.The storage manager 206 can store the documents 210 and 218 inpersistent storage 208 as schematically illustrated by arrow 226 using avariety of file systems, such as directory-based file services, objectstores and relational file systems.

The inventive system operates with conventional XML files. A completeXML file normally consists of three components that are defined byspecific markup tags. The first two components are optional, the lastcomponent is required, and the components are defined as follows:

-   -   1. An XML processing statement which identifies the version of        XML being used, the way in which it is encoded, and whether it        references other files or not. Such a statement takes the form:    -   <?xml version=“1.0” encoding=“UTF-8” standalone=“yes”?>    -   2. A document type definition (DTD) that defines the elements        present in the file and their relationship. The DTD either        contains formal markup tag declarations describing the type and        content of the markup tags in the file in an internal subset        (between square brackets) or references a file containing the        relevant markup declarations (an external subset). This        declaration has the form:    -   <!DOCTYPE Appl SYSTEM “app.dat”>    -   3. A tagged document instance which consists of a root element,        whose element type name must match the document type name in the        document type declaration. All other markup elements are nested        in the root element.

If all three components are present, and the document instance conformsto the document model defined in the DTD, the document is said to be“valid.” If only the last component is present, and no formal documentmodel is present, but each element is properly nested within its parentelements, and each attribute is specified as an attribute name followedby a value indicator (=) and a quoted string, document instance is saidto be “well-formed.” The inventive system can work with and generatewell-formed XML documents.

Within the storage manager 206, XML documents are represented by meansof data storage partitions which are collectively referred to by thename “Groove Document” to distinguish the representation fromconventional XML documents. Each Groove document can be described by aDTD that formally identifies the relationships between the variouselements that form the document. These DTDs follow the standard XMLformat. In addition, each Groove document has a definition, or schema,that describes the pattern of elements and attributes in the body of thedocument. XML version 1.0 does not support schemas. Therefore, in orderto associate a Groove schema document with an XML data document, aspecial XML processing instruction containing a URI reference to theschema is inserted in the data document. This processing instruction hasthe form:

-   -   <?schema        URI=“groovedocument:///GrooveXSS/$PersistRoot/sample.xml”?>

Some elements do not have, or require, content and act as placeholdersthat indicate where a certain process is to take place. A special formof tag is used in XML to indicate empty elements that do not have anycontents, and therefore, have no end-tag. For example, a <ThumbnailBox>element is typically an empty element that acts as a placeholder for animage embedded in a line of text and would have the followingdeclaration within a DTD:

-   -   <!ELEMENT ThumbnailBox EMPTY>

Where elements can have variable forms, or need to be linked together,they can be given suitable attributes to specify the properties to beapplied to them. These attributes are specified in a list. For example,it might be decided that the <ThumbnailBox> element could include aLocation and Size attributes. A suitable attribute list declaration forsuch an attribute would be as follows: <!ATTLIST ThumbnailBox   LocationENTITY #REQUIRED   Size CDATA #IMPLIED >

This tells the computer that the <ThumbnailBox> element includes arequired Location entity and may include a Size attribute. The keyword#IMPLIED indicates that it is permissible to omit the attribute in someinstances of the <ThumbnailBox> element.

XML also permits custom definition statements similar to the #DEFINEstatements used with some compilers. Commonly used definitions can bedeclared within the DTD as “entities.” A typical entity definition couldtake the form:

-   -   <!ENTITY BinDoc3487 SYSTEM “./3487.gif” NDATA>        which defines a file location for the binary document        “BinDoc3487.” Once such a declaration has been made in the DTD,        users can use a reference in place of the full value. For        example, the <ThumbnailBox> element described previously could        be specified as <ThumbnailBox Location=BinDoc3487        Size=“Autosize”/>. An advantage of using this technique is that,        should the defined value change at a later time, only the entity        declaration in the DTD will need to be updated as the entity        reference will automatically use the contents of the current        declaration.

Within the storage manager, each document part is identified by aUniform Resource Identifier (URI) which conforms to a standard formatsuch as specified in RFC 2396. URIs can be absolute or relative, butrelative URIs must be used only within the context of a base, absoluteURI. When the document is stored in persistent storage, its parts may beidentified by a different STORAGEURI that is assigned and managed by theparticular file system in use.

In accordance with the principles of the invention, within each documentpart, in the storage manager internal memory is represented by acollection of objects. For example, separate elements in the XMLdocument are represented as element objects in the storage manager. Thisresults in a structure that is illustrated in FIG. 3. In FIG. 3, anillustrative XML document 300 is represented as a collection of objectsin storage manager 302. In particular, the XML document 300 contains theconventional XML processing statement 304 which identifies the XMLversion, encoding and file references as discussed above. Document 300also contains an XML processing statement 306 which identifies a schemadocument 320 in storage manager 302 which is associated with thedocument 300. The illustrative XML document also contains a set ofhierarchical elements, including ElementA 308 which contains some text318, ElementA contains ElementB 310 which has no text associated withit. ElementB also contains ElementC 312, which, in turn, contains twoelements. Specifically, ElementC contains ElementD 314 that has anattribute (ID, with a value “foo”) and ElementE 316.

In the storage manager 302, the elements, ElementA-ElementE, arerepresented as element objects arranged in a hierarchy. In particular,ElementA is represented by ElementA object 322. Each element objectcontains the text and attributes included in the corresponding XMLelement. Therefore, element object 322 contains the text 318. Similarly,ElementB 310 is represented by element object 324 and elements ElementC,ElementD and ElementE are represented by objects 326, 328 and 330,respectively. Element object 328, which represents element ElementD,also includes the attribute ID that is included in the correspondingelement. Each element object references its child element objects bymeans of database pointers (indicated by arrows between the objects)into order to arrange the element objects into a hierarchy. There mayalso be attribute indices, such as index 332 that indexes the IDattribute in element object 328.

The representation of the XML document 300 by means of an objectcollection allows the storage manager 302 to manipulate its internalrepresentation of the document 300 with a consistent interface that isdiscussed in detail below. The storage manager 302 can also providefeatures that are not available in conventional XML documents, such ascollection services that are available via a collection manager that isalso discussed in detail below.

As described above, Groove documents that contain XML data may have adefinition, or schema document, that describes the pattern of elementsand attributes in the body of the document. The schema document isstored in a distinct XML document identified by a URI. The schemadocument has a standard XML DTD definition, called the meta-schema,which is shown below: <!-- The Document element is the root element inthe schema --> <!ELEMENT Document (Registry*, AttrGroup*, ElementDecl*)><!ATTLIST Document   URL CDATA #REQUIRED > <!ELEMENT RegistryTagToProgID*> <!ELEMENT TagToProgID EMPTY> <!ATTLIST TagToProgID   TagCDATA #REQUIRED   ProgID CDATA #REQUIRED > <!ELEMENT AttrGroup AttrDef*><!ELEMENT AttrDef EMPTY> <!ATTLIST AttrDef   Name CDATA #REQUIRED   TypeCDATA #REQUIRED   Index CDATA #IMPLIED   DefaultValue CDATA #IMPLIED ><!ELEMENT ElementDecl (ElementDecl* | AttrGroup | ElementRef*)><!ATTLIST ElementDecl   Name CDATA #REQUIRED > <!ELEMENT ElementRefEMPTY> <!ATTLIST ElementRef   Ref CDATA #REQUIRED >

Each of the elements in the schema defines information used by thestorage manager while processing the document. The “Registry” sectionforms an XML representation of a two-column table that maps XML elementtags to Windows ProgIDs. (In the Common Object Model (COM) developed byMicrosoft Corporation, a ProgID is a text name for an object that, inthe COM system, is “bound” to, or associated with, a section of programcode. The mapping between a given ProgID and the program code, which isstored in a library, is specified in a definition area such as theWindows™ registry.)

This arrangement is shown in FIG. 4A that illustrates an XML document400 and its related schema document 402. Both of these documents areresident in the storage manager 406 and would actually be represented byobjects as shown in FIG. 3. However, in FIG. 4, the documents have beenrepresented in conventional XML format for clarity. FIG. 4 shows thestorage manager operational in a Windows™ environment that uses objectsconstructed in accordance with the Common Object Model (COM) developedby the Microsoft Corporation, Redmond, Wash., however, the sameprinciples apply in other operating system environments.

XML document 400 includes the normal XML processing statement 414 thatidentifies the XML version, encoding and file references. A schema XMLprocessing statement 416 references the schema document 402 which schemadocument is associated with document 400 and has the name“urn:groove.net:sample.xml” defined by name statement 426. It alsoincludes a root element 418 which defines a name “doc.xml” and the “g”XML namespace which is defined as “urn:groove.net”

Document 400 has three other elements, including element 420 defined bytag “urn:groove.net:AAA”, element 422 defined by tag“urn:groove.net:BBB” and element 424 defined by tag“urn:groove.net:NoCode”. Element 424 is a simple element that has nocorresponding bound code and no corresponding tag-to-ProgID mapping inthe schema document 402.

Within the “registry” section defined by tag 428, the schema document402 has two element-to-COM ProgID mappings defined. One mapping isdefined for elements with the tag “urn:groove.net:AAA” and one forelements with the tag “urn:groove.net:BBB.” The bound code is accessedwhen the client application 404 invokes a method “OpenBoundCode( ).” Thesyntax for this invocation is given in Table 15 below and the stepsinvolved are illustrated in FIG. 4B. Invoking the OpenBoundCode( )method on a simple element, such as element 424 generates an exception.The process of retrieving the bound code starts in step 434 and proceedsto step 436 in which the OpenBoundCode( ) is invoked. Invoking theOpenBoundCode( ) method on an element with the element tag“urn:groove.net:AAA” causes the storage manager 406 to consult theregistry element 428 in the schema document 602 with the element tag asset forth in step 438. From section 430, the storage manager retrievesthe ProgID “Groove.Command” as indicated in step 440. In step 442, thestorage manager calls the COM manager 408 in instructs it to create anobject with this ProgID. In a conventional, well-known manner, in step444, the COM manager translates the ProgID to a CSLID using a key in theWindows Registry 410. In step 446, the COM manager uses the CSLID tofind a dynamically loadable library (DLL) file in the code database 412that has the code for the object. Finally, in step 448, the COM managercreates the object and returns an interface pointer for the object tothe storage manager 406 which, in turn, returns the pointer to theclient application 404. The routine then finishes in step 450. Theclient application 404 can then use the pointer to invoke methods in thecode that use attributes and content in the associated element. Theelement then behaves like any other COM object. A similar process occursif the OpenBoundCode( ) method is invoked on elements with the tag“urn:groove.net:BBB.”

The “AttrGroup” section defines non-XML characteristics for attributes.An attribute's data type can be defined as some type other than text andthe attribute may be indexed to facilitate fast retrieval of theelements that containing it.

The “ElementDecl” section provides a form of element definition similarto the DTD <!ELEMENT> declaration, but allows for extended attributecharacteristics and the definition of non-containment elementreferences.

The following example shows the sample portions of a schema document foran XML document that defines a “telespace” that is previously described.<groove:Document URL=“TelespaceSchema.xml”    xmlns:groove=“urn:groove.net:schema.1”>  <groove:Registry>  <groove:TagToProgID groove:Tag=“g:Command”   groove:ProgID=“Groove.Command”/>   <groove:TagToProgIDgroove:Tag=“groove:PropertySetChanged”   groove:ProgID=“Groove.PropSetChangeAdvise”/>  </groove:Registry> <groove:AttrGroup>   <groove:AttrDef Name=“ID” Index=“true”/>   <!--KEY EXCHANGE ATTRIBUTES -->   <groove:AttrDef Name=“NKey”Type=“Binary”/>   <groove:AttrDef Name=“ReKeyId” Type=“String”/>  <groove:AttrDef Name=“T” Type=“String”/>   <!-- AUTHENTICATIONATTRIBUTES -->   <groove:AttrDef Name=“MAC” Type=“Binary”/>  <groove:AttrDef Name=“Sig” Type=“Binary”/>   <!-- ENCRYPTIONATTRIBUTES -->   <groove:AttrDef Name=“IV” Type=“Binary”/>  <groove:AttrDef Name=“EC” Type=“Binary”/>   <!-- XML WrapperAttributes -->   <groove:AttrDef Name=“Rows” Type=“Long”/>  <groove:AttrDef Name=“Cols” Type=“Long”/>   <groove:AttrDefName=“Items” Type=“Long”/>   <groove:AttrDef Name=“ItemID” Type=“Bool”Index=“true”/>  </groove:AttrGroup>  <groove:ElementDeclName=“groove:Telespace”>   <AttrGroup>    <AttrDef Name=“Persist”DefaultValue=“True” Type=“Bool”/>    <AttrDef Name=“Access”DefaultValue=“Identity”    Type=“String”/>   </AttrGroup>   <ElementRefElement=“Dynamics”/>   <ElementRef Element=“Members”/> </groove:ElementDecl> </groove:Document>

In this example, there are two entries in the Tag to ProgID mappingtable. The first maps the tag “g:Command” (which, using XML namespaceexpansion, is “urn:groove.net.schema.1:Command”) to the ProgID“Groove.Command.” In the section defining attributes, the “ID” attributeis indexed, the data type of the NKey attribute is binary, and so on.

This schema data is represented by element objects and can be accessedand manipulated by the same storage manager element and attributeinterface methods used to manipulate documents as described in detailbelow. In particular, the information that describes a document can bemanipulated using the same interfaces that are used for manipulating thedocument content.

In accordance with another aspect of the invention, sub-documents can beassociated with a primary document. Any document may be a sub-documentof a given document. If a document contains a sub-document reference toanother document, then the referenced document is a sub-document. If twodocuments contain sub-document references to each other, then eachdocument is a sub-document of the other document. Each sub-document isreferenced from the primary document with conventional XML XLinklanguage, which is described in detail at website www.w3.org/TR/xlink.Links may also establish a relationship between an all-text XML documentand a binary sub-document. Binary documents do not have links to anykind of sub-document. If the link is to a document fragment, asubdocument relationship is established with the document that containsthe fragment. The relationship of documents and sub-documents isillustrated in FIG. 5.

For example, main document 500 contains links 502 which include a link,represented by arrow 510, to document 504 and a link, represented byarrow 508, to a binary document 506. Documents 504 and 506 are thussub-documents of document 500. Document 504, in turn, contains links 512which include a link, represented by arrow 514 to document 516 withcontent 518. Document 516 is a sub-document of document 500. Document506 contains binary content 520 and, therefore, cannot have links tosub-documents.

Sub-document links follow the standard definition for simple links. Anexemplary element definition of a link is as follows: <!ELEMENTGrooveLink ANY> <!ATTLIST GrooveLink   xml:link CDATA #FIXED “simple”  href CDATA #REQUIRED   role CDATA #IMPLIED “sub-document”   titleCDATA #IMPLIED   show (parsed|replace|new) #IMPLIED   actuate(auto|user) #IMPLIED   serialize (byvalue|byreference|ignored) #IMPLIED  behavior CDATA #IMPLIED   content-role CDATA #IMPLIED   content-titleCDATA #IMPLIED   inline (true|false) #IMPLIED “true” >

It is also possible to establish a sub-document relationship withoutusing the above definition by adding to a document an XML link which hasan xml:link attribute with a value “simple”, and a href attribute. Sucha link will establish a sub-document relationship to the documentidentified by a URI value in the href attribute.

Given the relationships from a document to its sub-documents, it ispossible to make a copy of an arbitrary set of documents andsub-documents. Within a single storage service, it may be possible todirectly perform such a copy. To cross storage services or to sendmultiple documents to another machine, the entire hierarchy of suchdocuments must be “describable” in a serialized fashion. The inventiveStorage Manager serializes multiple documents to a text representationconforming to the specification of MIME Encapsulation of Aggregatedocuments, such as HTML (MHTML) which is described in detail at websiteftp.isi.edu/in-notes/rfc2557.txt.

The following data stream fragment is an example of a document and areferenced sub-document as they would appear in an MHTML characterstream. In the example, “SP” means one space is present and “CRLF”represents a carriage return-line feed ASCII character pair. All othercharacters are transmitted literally. The MIME version header has thenormal MIME version and the Groove protocol version is in a RFC822comment. The comment is just the word “Groove” followed by an integer.The boundary separator string is unique, so a system that parsed theMIME, and then each body part, will work correctly. The serialized XMLtext is illustrated in UTF-8 format, but it could also be transmitted inWBXML format. The XML document has a XML prefix, which includes theversion and character encoding. The binary document is encoded inbase64. MIME-Version: SP 1.0 SP (Groove SP 2) CRLF Content-Type: SPmultipart/related; SP boundary=“<<[[&&&]]>>” CRLF CRLF--<<[[&&&]]>>Content-Type: SP text/XML; SP charset=“UTF-8” <?xmlversion=“1.0” encoding=‘utf-8’?> <rootelement> . . . </rootelement> CRLFCRLF --<<[[&&&]]>> Content-ID: SP <URI> CRLF Content-Type: SPapplication/octet-stream CRLF Content-Transfer-Encoding: base64 CRLFCRLFR0IGODIhdQAgAPcAAP//////zP//mf//Zv//M///AP/M///MzP/Mmf/MZv/MM//MAP+Z//+ZzP+Zmf+ZZv+ZM/+ZAP9m//9mzP9mmf9mZv9mM/9mAP8z//8zzP8zmf8zZv8zM/8zAP8A//8AzP8Amf8AZv8AM/8AAMz//8z/zMz/mcz/Zsz/M8z/AMzM/8zMzMzMmczMZszMM8zMAMyZ/8yZzMyZmcyZZsyZM8yZAMxm/8xmzMxmmcxmZsxmM8xmAMwz/8wzzMwzmcwzZswzM8wzAMwA/8wAzMwAmcwAZswAM8wAAJn//5n/zJn/mZn/Zpn/M5n/AJnM/5nMzJnMmZnMZpnMM5nMAJmZ/5mZOG/qTMnzJUWQHoMKHUq0KEagRpMqXaoUaU6dG2IKIOqRKtOkTq9q3VrV5sd/XMOKZZp1rNmzGsuiXct2hNq2cMVmXdkzZ12LLe/ehYrXpsy/MPUGHvw04IzCdhFbzasYMd+aUxsnnrzTq1uwcTN3tVrxrebPWDGDHr3UM+nTHE2jXn1RNevXEI3Dfi179urDJrte5BzVcknNhyNHZiyzJnGvuWMuppu7uHLkyV1Kxe1ccOGZ0Cn/xshcu8/K2g2LQ8bJGPJj4eh3+/WNHb118PAtBn8aXTrn6s7tl2QP9b399fhNN55tbe31FYEITIRbgqAtyCBwAz5I20MUVmjhhRgyFBAAOw==--<<[[&&&]]>>--

Unlike most XML processors, such as document editors or Internetbrowsers, the storage manager provides for concurrent documentoperations. Documents may be concurrently searched, elements may beconcurrently created, deleted, updated, or moved. Copies of elementhierarchies may be moved from one document to another. In most XMLprocessors, all of the updates to a document are driven by a singleuser, who is usually controlling a single thread within a single processon a single computer.

The storage manager maintains XML document integrity among many usersupdating the same document, using multiple threads in multipleprocesses. In a preferred embodiment, all of the updates occur on asingle computer, but, using other different, conventionalinter-processor communication mechanisms, other operational embodimentsare possible. FIG. 6 shows the basic structure of the storage managerand illustrates how it isolates application programs from cross-processcommunication issues. For example, two separate processes 600 and 602may be operating concurrently in the same computer or in differentcomputers. Process 600 is a “home” process as described below, whileprocess 602 is another process designated as Process N. Within process600, a multi-threaded client application program 606 is operating andwithin process 602, a multi-threaded client application program 616 isoperating.

Each application program 606 and 616 interfaces with a storage managerdesignated as 605 and 615, respectively. In process 600, the storagemanager comprises a storage manager interface layer 608 which is used byapplication program 608 to control and interface with the storagemanager. It comprises the database, document, element and schema objectsthat are actually manipulated by the application. The API exported bythis layer is discussed in detail below. The storage manager 605 alsoincludes distributed virtual object (DVO) database methods 610, DVOmethods for fundamental data types 612, DVO common system methods 609and distributed shared memory 614. Similarly, the storage manageroperating in process 602 includes transaction layer 618, DVO databasemethods 620, DVO methods for fundamental data types 622, DVO commonsystem methods 617 and distributed shared memory 624.

The two processes 600 and 602 communicate via a conventional messagepassing protocol or inter-process communication (IPC) system 604. Forprocesses that run in a single computer, such a system can beimplemented in the Windows® operating system by means of shared memorybuffers. If the processes are running in separate computers, anothermessage passing protocol, such as TCP/IP, can be used. Otherconventional messaging or communications systems can also be usedwithout modifying the operation of the invention. However, as is shownin FIG. 6, application programs 606 and 616 do not directly interactwith the message passing system 604. Instead, the application programs606 and 616 interact with storage managers 605 and 615, respectively,and storage managers 605 and 615 interact with the message passingsystem 604 via a distributed shared memory (DSM) system of which DSMsystems 614 and 624 are a part.

A number of well-known DSM systems exist and are suitable for use withthe invention. In accordance with a preferred embodiment, the DSM systemused with the storage manager is called a C Region Library (CRL) system.The CRL system is an all-software distributed shared memory systemintended for use on message-passing multi-computers and distributedsystems. A CRL system and code for implementing such as system isdescribed in detail in an article entitled “CRL: High-PerformanceAll-Software Distributed Memory System”, K. L. Johnson, M. F. Kaashoekand D. A. Wallach, Proceedings of the Fifteenth Symposium on OperatingSystems Principles, ACM, December 1995; and “CRL version 1.0 UserDocumentation”, K. L. Johnson, J. Adler and S. K. Gupta, , MITLaboratory for Computer Science, Cambridge, Mass. 02139, August 1995.Both articles are available at web address www.pdos.lcs.mit.edu/crl.

Parallel applications built on top of the CRL, such as the storagemanager, share data through memory “regions.” Each region is anarbitrarily sized, contiguous area of memory. Regions of shared memoryare created, mapped in other processes, unmapped, and destroyed byvarious functions of the DSM system. The DSM system used in the presentinvention provides a super-set of the functions that are used in the CRLDSM system. Users of memory regions synchronize their access bydeclaring to the DSM when they need to read from, or write to, a region,and then, after using a region, declaring the read or write complete.The effects of write operations are not propagated to other processessharing the region until those processes declare their need for it. Inaddition to the basic shared memory and synchronization operations, DSMprovides error handling and reliability with transactions. The fullinterface to inventive DSM is shown in Table 1. TABLE 1 DSM MethodDescription AddNotification(DSMRgn* i_pRgn, const Adds a local eventthat will be signaled IgrooveManualResetEvent * i_pEvent); with the datain the region changes. Close( ); Shuts down the DSM. There must be nomapped regions at this client. Create(UINT4 i_Size, INT4 Creates a newregion. It also atomically i_CallbackParam, INCAddress maps the newregion and initiates a i_InitialOwner, DSMRId & io_RId, StartWrite onthe new region if Size is DSMRgn * & o_pRgn, void * & o_pData);non-zero. Size is the initial size of the data in the new region. RId isidentifier of the new region. pRgn is the new region if Size isnon-zero. AddDatabase(UINT2 i_DatabaseNumber); Adds a new database tothe region mapping tables. DatabaseFlushNotify(UINT2 Cleans up unusedregion resources. i_DatabaseNumber, TimeMillis i_StartTime);Destroy(DSMRId& i_RId); Destroys an existing region entirely. RId is avalid identifier of the region to be destroyed. EndRead(DSMRgn* i_pRgn);Closes a read operation on the region's data. pRgn is the valid region.EndWrite(DSMRgn* i_pRgn); Closes a write operation on the region's data.pRgn is the valid region. Flush(DSMRgn* i_pRgn); Flushes the region fromthis client's local cache to the region's home client. pRgn is the validregion. GetSize(DSMRgn* i_pRgn); Returns the size(number of bytes) ofthe given valid region. pRgn is the valid region. Init(CBSTRi_BroadcastGroup, Initializes the DSM. BroadcastGroup isDSMRgnMapCallback * i_pCallback = the name of the group in which thisDSM NULL, void * i_pCallbackParam = NULL, client belongs. URCSize is thesize of the BOOL * o_pMasterClient = NULL, UINT4 Unmapped Regions Cache.PAddress is i_WaitTimeOut = 1000, the Inter-node Communication Addressof UINT4 i_URCSize = 1<<10, this DSM client. pMasterClient specifiesINCAddress * o_pAddress = NULL); whether this DSM client is theMaster(First) client. Map(const DSMRId& i_RId, INT4 Maps the region tothis client's memory i_CallbackParam, BOOL i_InitialOwner); space. RIdis a valid identifier of the region to be mapped. RemoveDatabase(UINT2Removes the specified database from the i_DatabaseNumber); regionmapping tables. RemoveNotification(DSMRgn* i_pRgn, Removes interest inchanges to data in a const IGrooveManualResetEvent * region. i_pEvent);Resize(DSMRgn* i_pRgn, UINT4 i_Size); Resizes the given valid regionwhile maintaining the original data(which may be truncated if the sizeis decreased). pRgn is the valid region. Size is the new size.GetRId(const DSMRgn* i_pRgn); Returns the identifier for the given validregion. pRgn is the valid region. SignalNotification(DSMRgn* i_pRgn);Sets the signal that notification has occurred. StartRead(DSMRgn*i_pRgn, INT4 Initiates a read operation on the region's i_CallbackParam,void * & o_pData); data. RgnStartRead (or RgnStartWrite) must be calledbefore the data can be read. pRgn is the valid region.StartTransactionRead(DSMRgn* i_pRgn, Initiates a transactional readoperation on INT4 i_CallbackParam, void * & o_pData); the region's data.RgnStartRead (or RgnStartWrite) must be called before the data can beread. pRgn is the valid region. StartTransactionWrite(DSMRgn* i_pRgn,Initiates a transactional write operation on INT4 i_CallbackParam,void * & o_pData); the region's data. RgnStartWrite must be calledbefore the data can be modified. pRgn is the valid region.StartWrite(DSMRgn* i_pRgn, INT4 Initiates a write operation on theregion's i_CallbackParam, void * & o_pData); data. RgnStartWrite must becalled before the data can be modified. pRgn is the valid region.Unmap(DSMRgn* & io_pRgn); Unmaps the region from this client's memoryspace. pRgn is the valid region to be unmapped.

Each storage manager 605 and 615 comprises a DSM node that uses one ormore DSM regions (not shown in FIG. 6) located in the address space ofthe corresponding process 600, 602. These regions contain DVO objectsand classes that can be used to represent documents, elements and schemaof the XML data that is managed by the storage manager. Portions ofdocuments, usually elements and index sections, are wholly containedwithin a region. Although the DSM system provides a conceptually uniformnode space for sharing regions, there are issues that result in the needto single out a specific node or process to perform special tasks.

Consequently, within the DSM synchronization protocol, a single node isidentified as a “home node” for each region. Within the many processesrunning the storage manager on a single computer, one process, calledthe “home process”, is the process that performs all disk I/Ooperations. To reduce the amount of data movement between processes, thehome process is the home node for all regions. Other implementations arepossible, in which any node may be the home for any region and anyprocess may perform disk I/O. However, for personal computers with asingle disk drive, allowing multiple processes to perform disk I/Ointroduces the need for I/O synchronization while not alleviating themain performance bottleneck, which is the single disk.

In accordance with the DSM operation, if a process has the most recentcopy of a region, then it can read and write into the region. Otherwise,the process must request the most-recent copy from the home processbefore it can read and write in the region. Each DSM system 614, 624interfaces with the message passing system 604 via an interface layercalled an internode communication layer ( 615, 625 ) which isolates theDVM system from the underlying transport mechanism. It contains methodsthat send messages to a broadcast group, and manipulate addresses forthe corresponding process and the home process.

The inventive storage manager uses shared objects as the basis for XMLobjects. Many systems exist for sharing objects across processes andcomputers. One such object-sharing model is based on the use of theshared memory facilities provided by an operating system. One of thebiggest drawbacks of such a shared memory model is unreliability due tomemory write failures that impact the integrity of other processes. Forexample, if one process is in the process of updating the state of anobject and the process fails before setting the object to a known goodstate, other processes will either see the object in an invalid state ormay blocked indefinitely waiting for the failed process to release itssynchronization locks. The shared memory model also suffers from thelocality constraints of shared memory in a tightly coupledmulti-computer—it provides no way to share objects over a network.

Another model that provides distributed object sharing and remote methodinvocation is the basis for the distributed object management facilitiesin Java or the Object Management Group's CORBA system. Althoughproviding the ability to share objects over a computer network, clientsof such systems need to be aware of whether an object is local orremote—objects are not location independent. Performance is anotherdrawback of this approach. All operations on an object need to betransmitted to the object server, since the server contains the onlycopy of the object state and serves as the synchronization point forthat data.

In order to overcome these drawbacks, the inventive storage manager usesa distributed virtual object (DVO) system to provide the primitive datatypes that XML object types are built upon. The DVO system also providesits callers with the illusion that all data is reliably contained in oneprocess on a single computer node, even though the data may be inmultiple processes on many computers or may truly be just in one processon a single computer node.

The DVO object-sharing model is shown in FIG. 7. All processes, on allcomputers, that are sharing an object have the same method code. Forexample, process 700 and process 702 in FIG. 7 have copies of the sameobject. Thus, each of processes 700 and 702 has a copy of the samemethod code 704 and 706 in the respective process address space. Thevolatile data state for an object is stored in DSM regions. Thus, theobject data 708 for the object copy in process 700 is stored in region710 in the address space of process 700. Similarly, the object data 712for the object copy in process 702 is stored in region 714 in theaddress space of process 702. Object methods synchronize their access tothe object's data by using the DSM synchronization functions thatsynchronize the regions as illustrated by arrow 716. In this manner, DVOobjects are location independent, failures are contained within a singleprocess, and multiple changes to a local object do not require datamovement across the inter-node transport.

The DVO system provides basic objects that may be used as buildingblocks to manage XML documents for the storage manager and is dividedinto three functional pieces. The DVO database 610 contains objects thathandle the DVO local context in each process and the shared tables thatcontain information about open databases and documents contained withinthose databases. In DVO, “databases” are conceptual storage containersand may channel objects that are ultimately stored in any kind ofstorage service 609. DVO documents are associated with XML or binarydocuments, which are visible to a client of the storage manager. DVOdocuments are also used to contain the indices and metadata associatedwith a collection.

DVO types 612 is a set of object classes that can be used within DVOdocuments to implement higher-level data model constructs. DVO typesrange from simple data containment objects through complex, scalableindex structures. Each DVO type is implemented with two classes—one is a“non-shared class” that uses memory pointers in object references andthe other is a “shared class” that uses logical addresses, calleddatabase pointers, for object references. The “shared class” has twosub-forms—one is the representation of the object in a shared DSM regionand the other is the representation of the object stored on-disk in anobject store database. The DVO system 607 provides methods to transferobjects between their shared and non-shared implementations.

The different DVO types are shown in Table 2. TABLE 2 DVO TypeDescription Binary Document A kind of document that handles binary data.B-tree Index The type of the root of a b-tree index. It contains adescription of the index, as well as the address of the root index node.Btree Node A piece of a Btree index which can contain variable numbersof records, sorted by one or more keys. Collection Document A kind ofdocument that handles Collection documents. In addition to the Documentmethods, it has methods to handle the collection descriptor, indiceswithin the collection, and read marks. Document The base type from whichthe other document types inherit common methods, such as Open, Close,Create, and Write. Extendible Hashing A type implementation ofextendible hashing, as defined in “Extendible Hashing - A Fast AccessMethod for Dynamic Files”, Ronald Fagin, Jürg Nievergelt, NicholasPippenger, H. Raymond Strong. ACM Transactions on Database Systems 4(3),pages 315-344, 1979. FlatCollectionDocument A specific kind ofCollectionDocument used in shared regions. FlatDocument A specific kindof XMLDocument used in shared regions. FlatNode A specific kind of Nodeused in shared regions. Node The type used to store XML elements. It hasmethods to manage the element name, the element's parent, elementcontent, element attributes, links to other elements, and changenotifications. Ordered Bucket A kind of index which supports key orderedsorting (integer, double, string) Ordered Index A type that provides acollated data vector. It has methods for adding, removing, and changingkey/data pairs, managing index cursors, and managing parent and sub-indicies. Ordered Index Types Data types, called records and fields,that can be stored in ordered indices. Ordinal Ordered Index A kind ofindex that support ordinal addressing. It is conceptually similar tovector that allows any entry to be addressed by position (e.g.,vec[14]). In addition to the index methods, it has methods to moveentries to specific positions within the index. Red-Black Index A kindof ordered index that implements balancing using the red-black binarytree algorithm. W32BinaryDocument A specific kind of binary document for32-bit Windows platforms. XML Document A kind of document that handlesXML documents. In addition to the Document methods, it has methods tohandle schemas and indexes.

The DVO system 607 objects isolate the upper levels of DVO from physicalstorage and process locality issues. The DVO system objects use DSM forinvoking and handling requests to and from the home process. Requestsinclude operations such as opening, closing, and deleting a database,finding documents in a database, and opening, closing, deleting, andwriting database documents. The DVO system 607 in the master process 600can also retrieve DVO objects from a storage service 609. A storageservice, such as service 609, is a utility program that stores andretrieves information from a persistent medium and is responsible forthe physical integrity of a container, database or file. It ensures thatall updates are durable and that all internal data structures (e.g.,redirection tables, space allocation maps) are always consistent ondisk. Other processes, such as process 602 cannot access the storageservice 609 directly, but can access the system indirectly via its DSMregions 624.

The storage manager 605 can operate with different types of physicalstorage systems, including container or object stores, stream filesystems and ZIP files. In order to achieve atomic commits, the objectstore storage service can be implemented using page-orientedinput/output operations and a ping-pong shadow page table.

Individual storage manager methods are atomic. Multiple storage manageroperations, even operations on different documents, may be grouped into“transactions.” Transactions not only protect XML data integrity, butthey also improve performance because they enable the storage manager toreduce the number of region lock operations and reduce the amount ofdata movement over the message passing system.

The storage manager supports both read-write and read-only transactionsbuilt on DSM synchronization primitives described in the DSMdocumentation referenced above, which primitives insure consistency inmultiple processes or computers. Read-write transactions provide for theatomicity and consistency of a set of database read and writeoperations. Each region that is changed as part of a transaction will bekept in a “locked” state until the transaction is committed or aborted.This prevents operations that are not part of the transaction fromseeing the changes. Further, each transaction stores a “before image” ofthe regions it modifies so that, if the transaction is aborted (as aresult of an explicit API call or an exception), the effects of thetransaction can be undone. Depending on the performance requirements, analternative implementation would write undo information rather thanstoring the full “before image.” A read-only transaction uses the sameinterface as a read-write transaction. A read-only transaction ensuresthat multiple read operations are consistent. Like other transactions,it uses DSM functions to keep all read regions in a “read state” untilit is finished.

In addition, checkpoints can be used to ensure that changes arepersistent and provide durability for storage manager operations. Acheckpoint may be performed at any time. Checkpoints are used inconjunction with data recovery logging. All operations write “redo”information to a sequential recovery log file when they are committed.When the checkpoint is committed, the recovery log file will be flushedto persistent storage and will ensure that the operations can berecovered. Since transactions do not write “redo” information until theyare committed, if a checkpoint operation is commenced in the middle of atransaction, the transaction operations will not be flushed.

Transactions are scoped to a thread and a database. Once a transactionis started on a thread for a particular database, that transaction willbe automatically used for all subsequent storage manager operations onthat database and thread. An extension of conventional operating systemthreads is used, so that transactions correctly handle calls that needto be marshaled to other threads, for example, a user interface thread,using the Groove system's simple marshaler. Storage manager calls madeon a thread and database that doesn't have a transaction started willcause the storage manager to create a “default transaction” that will becommitted just before the call ends. Alternatively, starting a newtransaction on a thread and database that already has an existingtransaction in progress will cause the new transaction to automatically“nest” in the existing transaction. Nested transactions provide theability to roll back the system within the outer transaction. Inparticular, inner, nested transactions are not finally committed untilthe outermost transaction is committed. For example, if a nestedtransaction is committed, but the containing transaction is lateraborted, the nested transaction will be aborted.

In a preferred embodiment of the invention, the storage manager isimplemented in an object-oriented environment. Accordingly, both thestorage manager itself and all of the document components, such asdocuments, elements, entities, etc. are implemented as objects. Theseobjects, their interface, the underlying structure and the API used tointerface with the storage manager are illustrated in FIG. 8. The API isdescribed in more detail in connection with FIGS. 9-11. Referring toFIG. 8, the storage manager provides shared access to documents, via thedocument manipulation API 802, but, in order to enable a fullprogramming model for client applications, additional communication andsynchronization operations are provided, within the context of adocument. For example, the storage manager provides queued elementoperations, which enable one process to send an element to anotherprocess via the Queue API 804. Elements can be sent by value (a copy ofthe whole element) or by reference to the element. Synchronizationoperations are also provided to allow one or more threads to wait for anelement to be enqueued to a given queue. The storage manager alsoprovides RPC-style element communication and synchronization, via theRPC API 804.

Other client components may need to be aware of when documents arecreated in or deleted from storage manager. Accordingly, the storagemanager provides an interface to an interest-based notification systemfor those client components via notification API 800. The notificationsystem 806 provides notifications to client components that haveregistered an interest when a document is created or deleted.

Document data is represented by a collection of objects includingdatabase objects, document objects, element objects and schema objects808. The objects can be directly manipulated by means of the documentmanipulation API 802.

The document related objects 808 are actually implemented by thedistributed virtual object system 810 that was discussed in,detailabove. The distributed virtual object system 810 can also be manipulatedby element queue and RPC objects 812 under control of the queue and RPCAPI 804.

The distributed virtual object system 810 communicates with thedistributed shared memory via interface 814 and communicates with thelogging operations via interface 816. Similarly, the distributed virtualobject system can interact with the storage services via interface 818.

The following is a description of the interfaces for each of the objectsused to implement a preferred embodiment of the inventive storagemanager. These object are designed in accordance with the Common ObjectModel (COM) promulgated by Microsoft Corporation, Redmond, Wash., andcan be manipulated in memory as COM objects. However, COM is just oneobject model and one set of interface methodologies. The invention couldalso be implemented using other styles of interface and object models,including but not limited to the Java and CORBA object models.

FIG. 9 illustrates object interfaces for a storage manager object. Aninterface 900 (IGrooveStorageManager) encapsulates the basic frameworkfor the storage manager. This interface is a subclass of an IDispatchinterface which is a common class defined by the COM model. Table 3defines the methods included in the storage manager interface. TABLE 3Interface IGrooveStorageManager : IDispatch CreateDatabase (BSTR Createsa database. A database can be i_DatabaseURI, VARIANT_BOOL eithertemporary or permanent, and single or i_Temporary, VARIANT_BOOLmulti-process. The DatabaseURI specifies i_SingleProcess, IUnknown * thelocation of the database. i_pSecurityContext, VARIANT_BOOLi_CreateOnCheckpoint, IgrooveDatabase ** o_ppDatabase);CreateOrOpenDatabase (BSTR Creates a new database or opens an existingi_DatabaseURI, VARIANT_BOOL database. i_Temporary, VARIANT_BOOLi_SingleProcess, IUnknown * i_pSecurityContext, VARIANT_BOOLi_CreateOnCheckpoint, VARIANT_BOOL * o_pCreated, IgrooveDatabase **o_ppDatabase); CreateTemporaryElement (BSTR Creates a temporary element.i_Name, Iunknown * i_pParent, IgrooveElement ** o_ppElement);CreateTemporaryXMLDocument Creates an empty temporary document with a(BSTR i_NamePrefix, BSTR unique URI i_SchemaURI, IUnknown*i_pAdditionalSchemaURIs, IgrooveXMLDocument ** o_ppXMLDocument);CreateTransform (BSTR Creates a transformation interface.i_CollectionDescriptorURI, BSTR i_SecondaryDescriptorURI, BSTRi_CollectionDescriptorName, IgrooveTransform ** o_ppTransfom);DeleteDatabase (BSTR Deletes a database. i_DatabaseURI); IsHomeProcess(VARIANT_BOOL * Determine whether we are the home processo_pHomeProcess); OpenCrossProcessSemaphore (BSTR Creates a semaphoreobject that can be used i_Name, VARIANT_BOOL i_Reentrant, to synchronizeactivity in different processes. IgrooveCrossProcessSemaphore ** If thesemaphore is not Reentrant, repeated o_ppSemaphore); attempts to lockthe semaphore within the same thread and process will block.OpenDatabase (BSTR i_DatabaseURI, Open an existing database.VARIANT_BOOL i_SingleProcess, Iunknown * i_pSecurityContext,IgrooveDatabase ** o_ppDatabase); OpenDatabaseURIEnum(IGrooveBST Returnsan Enumeration of the databases that REnum ** o_ppDatabaseURI); arecurrently open.

Another interface 902 (IGrooveStorageURISyntax) is used by a client of astorage manager that needs to perform operations on parts of standardnames, which are in the form of Uniform Resource Identifiers (URIs).Table 4 includes the methods for the IGrooveStorageURISyntax interface.TABLE 4 Interface IGrooveStorageURISyntax : IDispatch BuildDatabaseURI(BSTR Builds a database URI from its pieces. i_ServiceName, BSTRi_DatabasePath, VARIANT_BOOL i_Relative, BSTR *o_pURI); BuildDocumentURI(BSTR Builds a document URI from its pieces. i_ServiceName, BSTRi_DatabasePath, BSTR i_DocumentName, VARIANT_BOOL i_Relative, BSTR *o_pURI); MakeAbsolute (BSTR i_RelativeURI, Given a relative URI withinthe scope of this BSTR * o_pAbsoluteURI); database, return an absoluteURI. MakeRelative (BSTR i_AbsoluteURI, Given an absolute URI within thisdatabase, BSTR * o_pRelativeURI); return a relative URI within the scopeof this database. OpenDatabasePath (BSTR I_URI, Returns the directorypath portion of a URI. BSTR * o_pDatabasePath); OpenDocumentName (BSTRi_URI, Returns the document name portion of a URI. BSTR *o_pDocumentName); OpenPersistRootPath (BSTR * Returns the directory pathto the root of the o_pPath); Groove persistent data directories.OpenServiceName (BSTR i_URI, Returns the storage service portion of aURI. BSTR * o_pServiceName); Parse (BSTR i_URI BSTR * Parses the piecesof the given URI. o_pServiceName, BSTR * o_pDatabasePath, BSTR *o_pDocumentName);

FIG. 10 illustrates the notification system interfaces. Interface 1000(IGrooveLinkCallback) is an interface for use by a client of a storagemanager that needs to be notified during the input processing of XMLdocument or element when a definition for a link is found. The interfaceincludes the methods defined in Table 5. TABLE 5 InterfaceIGrooveLinkCallback : IDispatch HandleLink (IGrooveElement * Called whenthe specified i_pLinkElement, IGrooveByteInputStream * element containsa i_pLinkData); link attribute definition.

Another interface 1002 (IGrooveRPCServerCallback) is used by a client ofa storage manager that needs to handle remote procedure calls (RPCs) onelements within XML documents. RPC server callbacks are a sub-class ofthe “util” base class (described below), that is, all of the methods forIGrooveElementUtilBase also apply to IGrooveRPCServerCallback. Table 6defines the methods used in the storage manager RPC server callbackinterface. TABLE 6 Interface IGrooveElementRPCServerCallback : IDispatchHandleCall (IGrooveElement * i_pInput, Handle a RPC, receivingIgrooveElement ** o_ppOutput); input parameters in the Input element andreturning output parameters in the Output element.

FIGS. 11, 12 and 13 illustrate the document manipulation interfaces andthe queue and RPC interfaces. In particular, FIG. 11 shows theinterfaces used to manipulate databases. An interface 1100(IGrooveDatabase) is used by a client of a storage manager that needs tomanage the databases in which documents are stored. It includes themethods in Table 7. TABLE 7 Interface IGrooveDatabase : IDispatchCheckpoint ( ); Creates a durable point of state for the database.ClearDataLost ( ); Clears the database flag that indicates data may havebeen lost since the database was opened or the last transaction wascommitted. CreateBinaryDocumentFromStream Creates a binary document withthe specified (IgrooveByteInputStream *i_pStream, name in the database.BSTR I_DocumentName, IgrooveBinaryDocument ** o_ppDocument);CreateOrOpenXMLDocument (BSTR Opens the specified XML document; createsi_DocumentName, BSTR an empty document with the specified namei_RootElementName, BSTR and schema it if it doesn't already exist.i_SchemaURI, IUnknown * i_pAdditionalSchemaURIs, VARIANT_BOOL *o_pCreated, IGrooveXMLDocument ** o_ppDocument); CreateXMLDocument (BSTRCreates an empty XML document with the i_DocumentName, BSTR specifiedname and schema in the database. i_RootElementName, BSTR i_SchemaURI,IUnknown * i_pAdditionalSchemaURIs, IGrooveXMLDocument ** o_ppDocument);CreateXMLDocumentFromStream Given a stream of bytes, representing one of(IGrooveByteInputStream * i_pStream, the supported character setencodings of a GrooveParseOptions i_ParseOptions, XML document, createsan XML document in BSTR i_DocumentName, BSTR the database. i_SchemaURI,IUnknown * i_pAdditionalSchemaURIs, IUnknown * i_pLinkCallback,IGrooveXMLDocument ** o_ppDocument); DeleteDocument (BSTR Deletes thenamed document. i_DocumentName); DocumentExists (BSTR Given thespecified document name, checks i_DocumentName, VARIANT_BOOL * for theexistence of the document in the o_pDocumentExists); database.IsTransactionInProgress Returns TRUE if a transaction is in progress.(VARIANT_BOOL * o_pTransactionInProgress); OpenBinaryDocument (BSTROpens the specified binary document. i_DocumentName,IGrooveBinaryDocument ** o_ppDocument); OpenCrossProcessSemaphore (BSTRCreates a new cross process synchronization i_Name, VARIANT_BOOL object.If Name is not specified, the default i_Reentrant, name for the databaseis used. If the IGrooveCrossProcessSemaphore ** semaphore is notReentrant, repeated o_ppSemaphore); attempts to lock the semaphorewithin the same thread and process will block. OpenDocumentNameEnumReturns an enumeration of the documents (VARIANT_BOOL i_OpenOnly,currently in a database. IGrooveBSTREnum ** o_ppDocumentNames);OpenTransaction (VARIANT_BOOL Creates a new transaction on the database.i_BeginLock, VARIANT_BOOL BeginLock specifies whether the databasei_ReadOnly, VARIANT_BOOL cross process semaphore should be locked.i_BeginTransaction, VARIANT_BOOL BeginTransaction specifies whether thei_Reentrant, BSTR i_LockName, transaction should start now. If LockNameis IGrooveTransaction ** not specified, the default name for theo_ppTransaction); database is used. If the semaphore is not Reentrant,repeated attempts to lock the semaphore within the same thread andprocess will block. OpenURI (BSTR * o_pDatabaseURI); Returns the URI forthis database. OpenXMLDocument (BSTR Opens the specified XML document.i_DocumentName, IGrooveXMLDocument ** o_ppDocument); WasDataLost(VARIANT_BOOL * Returns the value of a flag indicating whethero_pDataLost); data may have been lost since the database was opened orthe last transaction was committed.

Table 8 illustrates the methods for an interface 1102(IGrooveCrossProcessSemaphore) for a client of a storage manager thatneeds to synchronize access among processes. TABLE 8 InterfaceIGrooveCrossProcessSemaphore : IDispatch DoLock (VARIANT_BOOL Locks thesemaphore. If ReadOnly i_ReadOnly); is TRUE, only retrieval operationsmay be performed on the database, otherwise, any operation may beperformed. DoUnlock ( ); Unlocks the semaphore.

Table 9 illustrates an interface 1104 (IGrooveTransaction) for a clientof a storage manager that needs to group operations within a database.Transactions are a sub-class of cross-process semaphores, that is, allof the methods for IGrooveCrossProcessSemaphore also apply toIGrooveTransaction. The storage manager transaction interface includesthe following methods: TABLE 9 Interface IGrooveTransaction :IGrooveCrossProcessSemaphore Abort ( ); Ends the transaction. All workdone to the database since the start of the transaction is discarded.Begin (VARIANT_BOOL Starts a transaction. i_ReadOnly); If ReadOnly isfalse, the database may be updated. BeginIndependent (VARIANT_BOOLStarts another transaction i_ReadOnly); for this thread. Only oneindependent transaction is allowed per thread. Commit ( ); Ends thetransaction. All work done to the database since the start of thetransaction is reliably stored in the database.

FIG. 12 shows interfaces which allows clients of the storage manager tomanipulate documents and elements within those documents. Table 10illustrates an interface 1200 (IGrooveDocument) for a client of astorage manager that needs to manage documents within a database. Thestorage manager document interface includes the following methods: TABLE10 Interface IGrooveDocument : IDispatch OpenCrossProcessSemaphore (BSTRCreates a new cross i_Name, VARIANT_BOOL process synchronization object.i_Reentrant, If Name is not specified, the IgrooveCrossProcessSemaphore** URI for the document is used. o_ppSemaphore); If the semaphore is notReentrant, repeated attempts to lock the semaphore within the samethread and process will block. OpenDatabase (IGrooveDatabase ** Returnsan interface to o_ppDatabase); the database object that contains thisdocument. OpenName (BSTR * Returns the document name. o_pDocumentName);OpenURI (BSTR * o_pURI); Returns the URI that identifies this document.

Table 11 illustrates an interface 1202 (IGrooveXMLDocument) for a clientof a storage manager that needs to manage XML documents within adatabase. XML documents are a sub-class of documents, that is, all ofthe methods for IGrooveDocument also apply to IGrooveXMLDocument. Thestorage manager XML document interface includes the following methods:TABLE 11 interface IGrooveXMLDocument : IGrooveDocument GenerateGrooveID(BSTR Generates an 8 byte identifier from the string i_GrooveIDBase,double * identifier I_GrooveIDBase. o_pGrooveID);ConvertGrooveIDToSerializedGrooveID Converts an 8 byte identifier to thestring (double i_GrooveID, BSTR * i_GrooveID. o_pGrooveIDString);ConvertSerializedGrooveIDToGrooveID Converts a string version of aGroove (BSTR i_GrooveIDString, double * identifier to an 8 byte version.o_pGrooveID); CreateElement (BSTR i_Name, Creates a new element with thesupplied Tag; IUnknown * i_pParent, IGrooveElement the tag cannot bealtered once created. If a ** o_ppElement); Parent reference issupplied, the new element is created as a child of that parent.CreateElementCopy (IGrooveElement * Does a deep/shallow copy of thespecified i_pSource, IGrooveElement * element and all of its children(recursively for i_pParent, VARIANT_BOOL deep; just the one level forshallow), putting i_ShallowCopy, IGrooveElement ** the new element(s) inunder the Parent o_ppElement); element. CreateElementFromSchema (BSTRCreates an element that conforms to the i_Name, IGrooveElement *i_pParent, element's definition in the schema. Creates IGrooveElement **o_ppElement); the element, its attributes, and any child elements.CreateElementFromStream Using a parser, creates an element, reads(IGrooveByteInputStream * i_pStream, from a byte input stream andcreates GrooveParseOptions i_ParseOptions, elements and attributes fromthe text stream IUnknown * i_pParent, IUnknown * as necessary, insertingthem into the element, i_pLinkCallback, IGrooveElement ** which is thenreturned to the caller. If a o_ppElement); Parent reference is supplied,the new element is created as a child of that parent. CreateLocator(IGrooveLocator ** Returns the interface to a new locator object.o_ppLocator); FindElementByID (BSTR i_ID, Looks for an element of thespecified ID and IGrooveElement ** o_ppElement, returns a boolean valueif found. VARIANT_BOOL * o_pFound); OpenElementByID (BSTR i_ID, Looksfor an element of the specified ID. IGrooveElement **o_ppElement);OpenElementEnumByAttributeValue Returns an enumeration of all of theelements (BSTR i_ElementName, BSTR within the document that have thenamed i_AttributeName, BSTR attribute with the specified value.i_AttributeValue, IGrooveElementEnum **o_ppElementEnum);OpenElementEnumByAttributeValueAs Returns an enumeration of all of theelements Bool (BSTR i_ElementName, BSTR within the document that havethe named i_AttributeName, VARIANT_BOOL attribute with the specifiedboolean type i_AttributeValue, IGrooveElementEnum value.**o_ppElementEnum); OpenElementEnumByAttributeValueAs Returns anenumeration of all of the elements Double (BSTR i_ElementName, BSTRwithin the document that have the named i_AttributeName, doubleattribute with the specified double floating i_AttributeValue,IGrooveElementEnum type value. **o_ppElementEnum);OpenElementEnumByAttributeValueAs Returns an enumeration of all of theelements Long (BSTR i_AttributeName, long within the document that havethe named i_AttributeValue, IGrooveElementEnum attribute with thespecified long integer type **o_ppElementEnum); value.OpenElementEnumByLocator (BSTR Returns an element enumerator withi_LocatorText, IGrooveElementEnum ** references to all elementssatisfying the o_ppElementEnum); specified element locator expression.If there are no matching elements, the element enumerator will becreated with no contents. OpenElementEnumByName (BSTR Returns anenumeration of all of the elements i_Name, IGrooveElementEnum ** withinthe document that have the specified o_ppElementEnum); tag name.OpenMetaElement (IGrooveElement ** Returns the interface to the metaelement that o_ppElement); defines this XML document. OpenRootElement(IGrooveElement ** Opens the root element for the XML o_ppRootElement);document.

Table 12 illustrates the methods for an interface 1204(IGrooveBinaryDocument) for a client of a storage manager that needs tomanage binary documents within a database. Binary documents are asub-class of documents, that is, all of the methods for IGrooveDocumentalso apply to IGrooveBinaryDocument. TABLE 12 interfaceIGrooveBinaryDocument : IGrooveDocument OpenByteInputStream Returns theinterface to a byte stream (IGrooveByteInputStream ** object that can beused to read bytes o_ppByteInputStream); within the binary document.

Table 13 illustrates an interface 1206 (IGrooveLocator) for a client ofa storage manager that needs to search for elements using locatorqueries as defined in a specification called XSLT. Details of the XSLTspecification can be found at web address www.w3.org/TR/xslt. Thestorage manager locator interface includes the following methods: TABLE13 interface IGrooveLocator : IDispatch FindElement (BSTR i_LocatorStr,Returns an interface to the element object IGrooveElement *i_pContextElement, that satisfies the search specified by theIGrooveElement ** o_ppElement, Locator string within the scope of thecontext VARIANT_BOOL * o_pFound); element. Invalidate (VARIANT_BOOLClears the state information in the interface i_AssignNewIDs); instance.OpenElementEnum (BSTR Returns an enumerator of all elements thati_LocatorStr, IGrooveElement * match the Locator string, collatedaccording to i_pContextElement, VARIANT_BOOL the specified sortingcriteria. i_Sort, BSTR i_SortConstraint, BSTR i_SortKey, GrooveSortOrderi_SortOrder, IGrooveElementEnum ** o_ppElements);OpenElementEnumWithTumblers Perform the search specified by the Locator(BSTR i_LocatorStr, IGrooveElement string on the elements pointed to bythe *i_pContextElement, VARIANT_BOOL context element, returning thetumbler values i_RelativeTumblers, for each match as well as thematching IGrooveBSTREnum ** o_ppTumblers, elements, collated accordingto the specified VARIANT_BOOL i_Sort, BSTR sorting criteria.i_SortConstraint, BSTR i_SortKey, GrooveSortOrder i_SortOrder,IGrooveElementEnum ** o_ppElements); OpenText (BSTR i_LocatorStr,Returns the text from element or attribute that IGrooveElement *i_pContextElement, satisfies the search specified by the Locator BSTR *o_pValue); string within the scope of the context element.

Table 14 illustrates an interface 1208 (IGrooveTransform) for a clientof a storage manager that needs to perform XML document transformationsas defined in XSLT. The storage manager transform interface includes thefollowing methods: TABLE 14 Interface IGrooveTransform : IDispatchTransformXMLDocument Transforms the input XML document,(IGrooveXMLDocument * returning the result of the transformation ini_pXMLDocument, IGrooveElement * ResultDocument. i_pStartElement, BSTRi_SortRule, long i_StartElementNum, long i_NumElements,IGrooveXMLDocument * io_pResultDocument, VARIANT_BOOLi_AlwaysOutputHeader, long * o_pElementsProcessed); TransformElement(IGrooveElement * Transforms the input ContextElement,i_pContextElement, BSTR returning the result of the transformation ini_TansformationTemplate, ResultDocument. IGrooveXMLDocument **o_ppResultDocument);

Table 15 illustrates an interface 1210 (IGrooveElement) which allows aclient of a storage manager to manipulate elements within XML documents.The storage manager element interface includes the following methods:TABLE 15 Interface IGrooveElement : IDispatch AppendContent (BSTRi_Text, Inserts the kind of content as the last of its GrooveContentTypei_Type); type within this element. AppendContentElement Inserts theelement as the last content (IGrooveElement * i_pElement); element.AppendContentProcessingInstruction Inserts a processing instruction,with target (BSTR i_Target, BSTR i_Text); Target, as the last processinginstruction. CreateElement (BSTR i_Name, Create a new element in thesame IGrooveElement * i_pParent, document. IGrooveElement **o_ppElement); CreateElementCopy (IGrooveElement * Does a deep/shallowcopy of the specified i_pSource, IGrooveElement * i_pParent, element andall of its children (recursively for VARIANT_BOOL i_ShallowCopy, deep;just the one level for shallow), putting IGrooveElement ** o_ppElement);the new element(s) in the destination document. The returned elementmust be attached into the document's element tree.CreateElementFromSchema (BSTR Creates an element that conforms to thei_Name, IGrooveElement * i_pParent, element's definition in the schema.Creates IGrooveElement ** o_ppElement); the element, its attributes, andany child elements. CreateElementRPCClient Creates and returns theinterface to the (IGrooveElementRPCClient element RPC client.**o_ppRPCClient); CreateElementRPCServer Creates and returns theinterface to the (IGrooveElementRPCServer ** element RPC server.o_ppRPCServer); CreateElementRPCServerThread Creates and returns theinterface to the (IGrooveElementRPCServerCallback * element RPC serverthread. i_pCallback, IGrooveElementRPCServerThread **o_ppRPCServerThread); CreateLink (IGrooveDocument * Creates a link toanother document, using i_pDocument, BSTR i_Title, BSTR the specifiedXLink parameters. i_Role, GrooveXLinkShow i_Show, GrooveXLinkActuatei_Actuate, GrooveXLinkSerialize i_Serialize); DecrementAttributeAsLong(BSTR Subtracts 1 from the value of a long integer i_Name, long *o_pOldValue); type attribute. Delete ( ); Permanently removes theelement from the document. No further operations may be performed on adeleted element DeleteAllAttributes ( ); Removes all attributes from theelement. DeleteAllContent ( ); Removes all child content elements andtext from the element and deletes them from the document.DeleteAttribute (BSTR i_Name); Removes the named attribute from theelement. DeleteContent (long i_Ordinal); Removes the content at thespecified position from the element. DeleteLinkAttributes ( ); Removesall attributes that are links from the element. DetachFromParent ( );Removes this element from the content of its parent. The element isstill part of the document and must be reattached or destroyed before itis released. DoesAttributeExist (BSTR i_Name, Returns whether theattribute is set on the VARIANT_BOOL * o_pFound); element. Duplicate(IGrooveElement * Make the specified target element a i_pTargetElement,VARIANT_BOOL duplicate of this element, overriding i_ShallowDuplicate);attributes and, if ShallowDuplicate is FALSE, all descendent elements.FindAttribute (BSTR i_Name, BSTR * Gets any arbitrary attribute as text.If the o_pValue, VARIANT_BOOL * attribute is not in the element, Foundis o_pFound); FALSE and no value is returned. FindAttributeAsBinary(BSTR i_Name, Gets any arbitrary attribute as Binary. TheIGrooveByteInputStream ** o_ppValue, attribute must have been set as thegiven VARIANT_BOOL *o_pFound); type or be specified as that type in thedocument schema. If the attribute is not in the element, Found is FALSEand no value is returned. FindAttributeAsBinaryArray (BSTR Gets anyarbitrary attribute as Binary and i_Name, SAFEARRAY(BYTE) * return thevalue in an array. The attribute o_ppValue, VARIANT_BOOL * must havebeen set as the given type or be o_pFound); specified as that type inthe document schema. If the attribute is not in the element, Found isFALSE and no value is returned. FindAttributeAsBinaryToStream (BSTR Getsany arbitrary attribute as Binary and i_Name, IGrooveByteOutputStream *returns the value in a stream. The attribute i_pStream, VARIANT_BOOLmust have been set as the given type or be *o_pFound); specified as thattype in the document schema. If the attribute is not in the element,Found is FALSE and no value is returned. FindAttributeAsBool (BSTRi_Name, Gets any arbitrary attribute as Boolean. The VARIANT_BOOL *o_pValue, attribute must have been set as the given VARIANT_BOOL *o_pFound); type or be specified as that type in the document schema. Ifthe attribute is not in the element, Found is FALSE and no value isreturned. FindAttributeAsDouble (BSTR i_Name, Gets any arbitraryattribute as Double. The double * o_pValue, VARIANT_BOOL * attributemust have been set as the given o_pFound); type or be specified as thattype in the document schema. If the attribute is not in the element,Found is FALSE and no value is returned. FindAttributeAsGrooveID (BSTRGets any arbitrary attribute as a Groove i_Name, double * o_pValue,identifier. The attribute must have been set VARIANT_BOOL * o_pFound);as the given type or be specified as that type in the document schema.If the attribute is not in the element, Found is FALSE and no value isreturned. FindAttributeAsLong (BSTR i_Name, Gets any arbitrary attributeas Long. The long * o_pValue, VARIANT_BOOL * attribute must have beenset as the given o_pFound); type or be specified as that type in thedocument schema. If the attribute is not in the element, Found is FALSEand no value is returned. FindAttributeAsVARIANT (BSTR Gets anyarbitrary attribute as a variant i_Name, VARIANT * o_pValue, value. Ifthe attribute is not in the element, VARIANT_BOOL * o_pFound); Found isFALSE and no value is returned. FindContentElementByName (BSTR Withinthe context of this element, find an i_Name, IGrooveElement ** elementwith the specified tag name. If the o_ppElement, VARIANT_BOOL * elementis not found, Found is FALSE and o_pFound); no element reference isreturned. FindContentElementByNameAndAttribute Within the context ofthis element, find an (BSTR i_Name, BSTR element with the specified tagname and i_AttributeName, BSTR i_AttributeValue, attribute name with thespecified attribute IGrooveElement ** o_ppElement, value. If the elementis not found, Found is VARIANT_BOOL * o_pFound); FALSE and no elementreference is returned FindParent (IGrooveElement ** Gets an object'sparent element. An o_ppParent, VARIANT_BOOL * element can have only asingle parent and o_pFound); may only be referenced from a singlecontent entry of a single element. If the element does not have aparent, Found is FALSE and no value is returned. GetActuate(GrooveXLinkActuate * Returns the value of the Actuate parametero_pActuate); in this element's link attribute. GetAttributeCount (long *o_pCount); Returns the number of attributes an element has.GetContentCount (long * o_pCount); Returns the number of content andtext entries in this element. GetContentType (long i_Ordinal Returns thetype of content at the specified GrooveContentType * o_pType); ordinalposition. GetOrdinal (long * o_pOrdinal); Gets the ordinal positionwithin the parent's content of this element. GetSerialize(GrooveXLinkSerialize * Returns the value of the Serialize parametero_pSerialize); in this element's link attribute. GetShow(GrooveXLinkShow * Returns the value of the Show parameter in o_pShow);this element's link attribute. IncrementAttributeAsLong (BSTR Adds 1 tothe value of a long integer type i_Name, long * o_pOldValue); attribute.InsertContent (long i_Ordinal, BSTR Inserts the text entry at thespecified ordinal i_Text, GrooveContentType i_Type); locationInsertContentElement (long i_Ordinal, Inserts the element at thespecified ordinal IGrooveElement * i_pElement); locationInsertContentProcessingInstruction (long Inserts a Text processinginstruction, with i_Ordinal, BSTR i_Target, BSTR i_Text); target Target,at the specified ordinal position. IsLinkElement (VARIANT_BOOL *Determines whether or not the element o_plsLink); contains XLink markup.IsReferenced (VARIANT_BOOL * Returns TRUE if this element is referenced.o_plsReferenced); IsSame (IGrooveElement * i_pElement, Returns TRUE ifthe specified element VARIANT_BOOL * o_plsSame); object is this elementor equal to this element. OpenAttribute (BSTR i_Name, BSTR Gets anyarbitrary attribute as text. *o_pValue); OpenAttributeAsBinary (BSTRi_Name, Gets any arbitrary attribute as Binary. TheIGrooveByteInputStream ** o_ppValue); attribute must have been set asthe given type or be specified as that type in the document schema.OpenAttributeAsBinaryArray (BSTR Gets any arbitrary attribute as Binaryand i_Name, SAFEARRAY(BYTE) * return the value in an array. Theattribute o_ppValue); must have been set as the given type or bespecified as that type in the document schema.OpenAttributeAsBinaryToStream (BSTR Gets any arbitrary attribute asBinary and i_Name, IGrooveByteOutputStream * returns the value in astream. The attribute i_pStream); must have been set as the given typeor be specified as that type in the document schema. OpenAttributeAsBool(BSTR i_Name, Gets any arbitrary attribute as Boolean. TheVARIANT_BOOL * o_pValue); attribute must have been set as the given typeor be specified as that type in the document schema.OpenAttributeAsDouble (BSTR i_Name, Gets any arbitrary attribute asDouble. The double * o_pValue); attribute must have been set as thegiven type or be specified as that type in the document schema.OpenAttributeAsGrooveID (BSTR Gets any arbitrary attribute as a Groovei_Name, double * o_pValue); identifier. The attribute must have been setas the given type or be specified as that type in the document schema.OpenAttributeAsLong (BSTR i_Name, Gets any arbitrary attribute as Long.The long * o_pValue); attribute must have been set as the given type orbe specified as that type in the document schema. OpenAttributeAsVARlANT(BSTR Gets any arbitrary attribute as a variant i_Name, VARIANT *o_pValue); value. OpenAttributeEnum Enumerates all of the element'sattributes as (IGrooveStringStringEnum ** text. o_ppAttributes);OpenAttributeVariantEnum Enumerates all of the element's attributes as(IGrooveNameValueEnum ** variant data types. o_ppEnum); OpenBoundCode(IGrooveBoundCode Returns an instance of the object bound to **o_ppBoundCode); the element. OpenContentComment (long i_Ordinal, Returnsthe text of the comment that is a BSTR * o_pComment); contained in thiselement at the specified Ordinal position. OpenContentElement (longi_Ordinal, Returns the child element interface that is a IGrooveElement** o_ppElement); contained in this element at the specified Ordinalposition. OpenContentElementByName (BSTR Within the context of thiselement, find an i_Name, IGrooveElement ** element with the specifiedtag name and o_ppElement); return its interface.OpenContentElementByNameAndAttribute Within the context of this element,find an (BSTR i_Name, BSTR element with the specified tag name andi_AttributeName, BSTR i_AttributeValue, attribute name with thespecified attribute IGrooveElement ** o_ppElement); value.OpenContentElementEnum Returns an enumeration of all child content(IGrooveElementEnum ** elements (non-recursively). o_ppElements);OpenContentElementEnumByName Returns an enumeration of all child content(BSTR i_Name, IGrooveElementEnum ** elements (non-recursively). Onlyelements o_ppElements); with the given name will be returned.OpenContentElementEnumByNameAnd Returns an enumeration of all contentAttribute (BSTR i_Name, BSTR elements within the scope of this elementi_AttributeName, BSTR i_AttributeValue, that have the specified tag nameand IGrooveElementEnum ** o_ppElements); attribute name with thespecified attribute value. OpenContentProcessingInstruction (longReturns the XML processing instruction at i_Ordinal, BSTR * o_pTarget,BSTR * the specified ordinal position. o_pText);OpenContentProcessingInstructionTarget Returns the target of the XMLprocessing (long i_Ordinal, BSTR * o_pTarget); instruction at thespecified ordinal position. OpenContentProcessingInstructionText Returnsthe PI text of the XML processing (long i_Ordinal, BSTR * o_pText);instruction at the specified ordinal position. OpenContentText (longi_Ordinal, BSTR Returns the context text at the specified * o_pText);ordinal position. OpenContentTextEnum Enumerates the text entries(IGrooveBSTREnum ** o_ppText); (non-recursively). OpenElementQueueCreate an element queue on the element. (IGrooveElementQueue **o_ppQueue); The element queue does not affect the element's structure.OpenElementReferenceQueue Returns the interface to reference queue(IGrooveElementReferenceQueue ** object. o_ppQueue); OpenHRef (BSTR *o_pHref); Returns the value of the HREF parameter in this element's linkattribute. OpenLinkAttributes (BSTR * o_pHref, Retrieves all thestandard link elements. BSTR * o_pTitle, BSTR * o_pRole, Note : not allthe attributes are mandatory GrooveXLinkShow * o_pShow,GrooveXLinkActuate * o_pActuate, GrooveXLinkSerialize * o_pSerialize);OpenLinkedBinaryDocument Returns the interface to the binary document(VARIANT_BOOL i_SingleProcess, that is referenced in the HREF parameterin IUnknown * i_pSecurityContext, this element's link attribute.IGrooveBinaryDocument ** o_ppDocument); OpenLinkedXMLDocument Returnsthe interface to the XML document (VARIANT_BOOL i_SingleProcess, that isreferenced in the HREF parameter in IUnknown * i_pSecurityContext, thiselement's link attribute. IGrooveXMLDocument ** o_ppDocument);OpenMultiReaderElementQueueReader Create an element multi-reader queueon (IGrooveMultiReaderElementQueueReader the element and add a reader.This could ** o_ppQueue); change the structure of the element.OpenMultiReaderElementQueueWriter Create an element multi-writer queueon the (GrooveMultiReaderQueueOptions element and add a writer. Thiscould i_Options, change the structure of the element.IGrooveMultiReaderElementQueueWriter ** o_ppQueue);OpenMultiReaderElementReferenceQueue Returns the interface to themulti-reader Reader element reference queue reader object.(IGrooveMultiReaderElementQueueReader ** o_ppQueue);OpenMultiReaderElementReferenceQueue Returns the interface to themulti-reader Writer element reference queue writer object.(GrooveMultiReaderQueueOptions i_Options,IGrooveMultiReaderElementQueueWriter ** o_ppQueue); OpenName (BSTR *o_pName); Returns the element's tag name. OpenParent (IGrooveElement **Gets an object's parent element. An o_ppParent); element can have only asingle parent and may only be referenced from a single content entry ofa single element. OpenReadOnlyElement Return the read-only elementinterface to (VARIANT_BOOL i_AllowOpenParent, this element.IGrooveReadOnlyElement ** o_ppReadOnlyElement); OpenReference Returnsthe element reference interface to (IGrooveElementReference ** thiselement. o_ppElementReference); OpenRole (BSTR * o_pRole); Returns thevalue of the Role parameter in this element's link attribute. OpenTitle(BSTR * o_pTitle); Returns the value of the Title parameter in thiselement's link attribute. OpenURI (BSTR * o_pName); Returns the URI tothis element. OpenXMLDocument Returns the interface pointer to the XML(IGrooveXMLDocument ** document containing this element. o_ppDocument);Serialize (GrooveSerializeType i_Type, Serializes the element to astream with the enum GrooveCharEncoding i_Encoding, specified encodingand options. GrooveSerializeOptions i_Options, IGrooveByteInputStream **o_ppStream); SerializeReturnAdditionalLinkedDocuments Serializes theelement to a stream with the (GrooveSerializeType i_Type, enum specifiedencoding and options. Returns an GrooveCharEncoding i_Encoding,enumeration of interfaces to documents GrooveSerializeOptions i_Options,referenced by links in this element and all IGrooveDocumentEnum **descendents. o_ppAdditionalLinkedDocuments, IGrooveByteInputStream **o_ppStream); SerializeToStream Serializes the element to a stream withthe (IGrooveByteOutputStream * i_pStream, specified encoding andoptions. GrooveSerializeType i_Type, enum GrooveCharEncoding i_Encoding,GrooveSerializeOptions i_Options);SerializeToStreamReturnAdditionalLinked Serializes the element to astream with the Documents (IGrooveByteOutputStream specified encodingand options. Returns an * i_pStream, GrooveSerializeType enumeration ofinterfaces to documents i_Type, enum GrooveCharEncoding referenced bylinks in this element and all i_Encoding, GrooveSerializeOptionsdescendents. i_Options, IGrooveDocumentEnum **o_ppAdditionalLinkedDocuments); SetAttribute (BSTR i_Name, BSTR Sets anyarbitrary attribute as text. i_Value); SetAttributeAsBinary (BSTRi_Name, Sets any arbitrary attribute as Binary. TheIGrooveByteInputStream * i_pValue); attribute must have been set as thegiven type or be specified as that type in the document schema.SetAttributeAsBinaryArray (BSTR Sets any arbitrary attribute as Binaryand i_Name, SAFEARRAY(BYTE) * returns the value in an array. Theattribute i_pValue); must have been set as the given type or bespecified as that type in the document schema. SetAttributeAsBool (BSTRi_Name, Sets any arbitrary attribute as Boolean. The VARIANT_BOOLi_Value); attribute must have been set as the given type or be specifiedas that type in the document schema. SetAttributeAsDouble (BSTR i_Name,Sets any arbitrary attribute as Double. The double i_Value); attributemust have been set as the given type or be specified as that type in thedocument schema. SetAttributeAsGrooveID (BSTR i_Name, Sets any arbitraryattribute as a Groove double i_pValue); identifier. The attribute musthave been set as the given type or be specified as that type in thedocument schema. SetAttributeAsLong (BSTR i_Name, long Sets anyarbitrary attribute as Long. The i_Value); attribute must have been setas the given type or be specified as that type in the document schema.SetAttributeAsVARIANT (BSTR i_Name, Sets any arbitrary attribute using aVariant, VARIANT * i_pValue); which may be any variant type. SetContent(long i_Ordinal, BSTR Sets the content as the type's ordinal i_Text,GrooveContentType i_Type); position to the specified text. Note thatcontent of different types have independent ordinal positions.SetContentElement (long i_Ordinal, Set the content element at thespecified IGrooveElement * i_pElement); ordinal position.SetContentProcessingInstruction (long Set the content processinginstruction at the i_Ordinal, BSTR i_Target, BSTR i_Text); specifiedordinal position. SetContentTextEnum Creates text entries, separated by<BR> (IGrooveBSTREnum * i_pEnum); elements, for each text string in theenumerator. SetLinkAttributes (BSTR i_Href, BSTR Sets the linkattributes needed to make the i_Title, BSTR i_Role, GrooveXLinkShowelement a link element, including the i_Show, GrooveXLinkActuatei_Actuate, ‘xml:link’ attribute, which is implicitly set toGrooveXLinkSerialize i_Serialize); ‘simple’. SetName (BSTR i_Name); Setsthe name of the element. SetTempAttribute (BSTR i_Name, BSTR Sets anattribute with a temporary value, i_Value); which will not be committedin a transaction.

Table 16 illustrates the methods for an interface 1212(IGrooveReadOnlyElement) for a client of a storage manager that needs tomanipulate read-only elements within XML documents. Read-only elementsare a sub-class of elements, that is, all of the methods forIGrooveElement also apply to IGrooveReadOnlyElement. TABLE 16 interfaceIGrooveReadOnlyElement : IGrooveElement OpenReadOnlyParent Returns aread-only element interface to the (IGrooveReadOnlyElement ** parent ofthis element. o_ppParent); OpenContentReadOnlyElement (long Returns aread-only element interface to the i_Ordinal, IGrooveReadOnlyElement **content element at the specified Ordinal o_ppElement); position.OpenContentReadOnlyElementByName Within the context of this element,find an (BSTR i_Name, element with the specified tag name andIGrooveReadOnlyElement ** return its read-only interface. o_ppElement);FindContentReadOnlyElementByName Within the context of this element,find an (BSTR i_Name, element with the specified tag name andIGrooveReadOnlyElement ** return its read-only interface. If the elementis o_ppElement, VARIANT_BOOL * not found, Found is FALSE and no elemento_pFound); reference is returned. OpenContentReadOnlyElementEnum Returnsan enumeration of all child content (IGrooveReadOnlyElementEnum **elements read-only interfaces o_ppElements); (non-recursively).OpenContentReadOnlyElementEnumByName Returns an enumeration of all childcontent (BSTR i_Name, elements read-only interfacesIGrooveReadOnlyElementEnum ** (non-recursively). Only elements with theo_ppElements); given name will be returned.

Table 17 illustrates an interface 1214 (IGrooveElementReference) for aclient of a storage manager that needs to manipulate element referenceswithin XML documents. The storage manager element reference interfaceincludes the following methods: TABLE 17 InterfaceIGrooveElementReference : IDispatch OpenElement Returns a read-onlyelement interface to (IgrooveReadOnlyElement ** the referenced element.o_ppElement);

An interface 1216 (IGrooveElementUtilBase) for use within the storagemanager's other interfaces is shown in Table 18. TheIGrooveElementUtilBase is not an interface for commonly-used objects,but is intended to serve as the base class for other sub-classes (shownin FIG. 13) that do have commonly-used objects. All of the “util”interfaces are associated with an element. The storage manager elementutil base interface includes the following methods: TABLE 18 InterfaceIGrooveElementUtilBase : IDispatch OpenDocument Returns the interface ofthe (IgrooveXMLDocument ** containing XML document. o_ppDocument);OpenElement (IGrooveElement ** Returns the element's interface.o_ppElement);

Table 19 illustrates an interface 1218 (IGrooveBoundCode) for a clientof a storage manager that needs to handle executable code associatedwith elements within XML documents. The storage manager bound codeinterface includes the following methods: TABLE 19 interfaceIGrooveBoundCode : IDispatch SetElement (IGrooveElement * Sets theelement interface pointer i_pElement); associated with this element tag.OpenElement (IGrooveElement ** Retrieves the element interfaceo_ppElement); pointer associated with this element tag.

FIG. 13 illustrates interfaces which are sub-classes of theIGrooveElementUtilBase base class 1300, discussed above. Table 20illustrates an interface 1302 (IGrooveElementQueue) for a client of astorage manager that needs to manipulate queues on elements within XMLdocuments. Element queues are a sub-class of the “util” base class, thatis, all of the methods for IGrooveElementUtilBase also apply toIGrooveElementQueue. The storage manager element queue interfaceincludes the following methods: TABLE 20 interface IGrooveElementQueue :IGrooveElementUtilBase Enqueue (IGrooveElement * Enqueues the element.Note that the element i_pElement); must already be contained in thequeue's document. Dequeue (long i_TimeoutMilliseconds, Dequeues the nextavailable element in the IGrooveElement ** o_ppElement); queue. Returnsonly when an element is available or after the timeout period. Thereturned IGrooveElement pointer will be NULL if the timeout periodexpires. DequeueEnum (long Dequeues all available elements in the queue.i_TimeoutMilliseconds, Returns only when an element is available orIGrooveElementEnum ** after the timeout period. The returnedo_ppElements); IGrooveElement pointer will be NULL if the timeout periodexpires. OpenEvent (IGrooveEvent ** Returns an event that can be used to‘Wait’ o_ppEvent); for an element to be enqueued

Table 21 illustrates an interface 1306 (IGrooveElementReferenceQueue)for a client of a storage manager that needs to manipulate queues onelement references within XML documents. Element reference queues are asub-class of the “util” base class, that is, all of the methods forIGrooveElementUtilBase also apply to IGrooveElementReferenceQueue. Thestorage manager element reference queue interface includes the followingmethods: TABLE 21 interface IGrooveElementReferenceQueue :IGrooveElementUtilBase Enqueue (IGrooveElement * Enqueues the element.Note that the element i_pElement); must already be contained in thequeue's document. EnqueueReference (IGrooveElement * Enqueues areference to the element. Note i_pElement); that the element mustalready be contained in the queue's document. Dequeue (longi_TimeoutMilliseconds, Dequeues the next available element in theIGrooveElementReference ** queue. Returns only when an element iso_ppElementReference); available or after the timeout period. Thereturned IGrooveElementReference pointer will be NULL if the timeoutperiod expires. DequeueEnum (long Dequeues all available elements in thequeue. i_TimeoutMilliseconds, Returns only when an element is availableor IGrooveElementReferenceEnum ** after the timeout period. The returnedo_ppElementReferences); IGrooveElementReferenceEnum pointer will be NULLif the timeout period expires. OpenEvent (IGrooveEvent ** Returns anevent that can be used to ‘Wait’ o_ppEvent); for an element to beenqueued

Table 22 illustrates an interface 1310(IGrooveMultiReaderElementQueueReader) for a client of a storage managerthat needs to remove elements from multi-reader queues on elementswithin XML documents. Multi-reader element queues are a sub-class of the“util” base class, that is, all of the methods forIGrooveElementUtilBase also apply toIGrooveMultiReaderElementQueueReader. The storage manager multi-readerelement queue reader interface includes the following methods: TABLE 22interface IGrooveMultiReaderElementQueueReader : IGrooveElementUtilBaseDequeue (long i_TimeoutMilliseconds, Dequeues the next available elementin the IGrooveElement ** o_ppElement); queue. Returns only when anelement is available or after the timeout period. The returnedIGrooveElement pointer will be NULL if the timeout period expires.DequeueEnum (long Dequeues all available elements in the queue.i_TimeoutMilliseconds, Returns only when an element is available orIGrooveElementEnum ** after the timeout period. The returnedo_ppElements); IGrooveElement pointer will be NULL if the timeout periodexpires. OpenEvent (IGrooveEvent ** Returns an event that can be used to‘Wait’ o_ppEvent); for an element to be enqueued

Table 23 illustrates an interface 1314(IGrooveMultiReaderElementQueueWriter) for a client of a storage managerthat needs to add elements to multi-reader queues on elements within XMLdocuments. Multi-reader element queues are a sub-class of the “util”base class, that is, all of the methods for IGrooveElementUtilBase alsoapply to IGrooveMultiReaderElementQueueWriter. The storage managermulti-reader element queue writer interface includes the followingmethods: TABLE 23 interface IGrooveMultiReaderElementQueueWriter :IGrooveElementUtilBase Enqueue (IGrooveElement Enqueues the element andreturns the *i_pElement, long * number already enqueued. Note that theo_pNumEnqueued); element must already be contained in the queue'sdocument. GetNumReaders (long * Get the number of readers on the queue.o_pNumReaders);

Table 24 illustrates an interface 1318(IGrooveMultiReaderElementReferenceQueueWriter) for a client of astorage manager that needs to add element references to multi-readerqueues on elements within XML documents. Multi-reader element referencequeues are a sub-class of the “util” base class, that is, all of themethods for IGrooveElementUtilBase also apply toIGrooveMultiReaderElementReferenceQueueWriter. The storage managermulti-reader element reference queue writer interface includes thefollowing methods: TABLE 24 interfaceIGrooveMultiReaderElementReferenceQueueWriter : IGrooveElementUtilBaseEnqueue (IGrooveElement * i_pElement, Enqueues the element and returnsthe long * o_pNumEnqueued); number already enqueued. Note that theelement must already be contained in the queue's document.EnqueueReference (IGrooveElement * Enqueues the element reference andi_pElement, long * o_pNumEnqueued); returns the number already enqueued.Note that the element must already be contained in the queue's document.GetNumReaders (long * o_pNumReaders); Get the number of readers on thequeue.

Table 25 illustrates an interface 1316(IGrooveMultiReaderElementReferenceQueueReader) for a client of astorage manager that needs to remove element references frommulti-reader queues on elements within XML documents. Multi-readerelement reference queues are a sub-class of the “util“base class, thatis, all of the methods for IGrooveElementUtilBase also apply toIGrooveMultiReaderElementReferenceQueueReader. The storage managermulti-reader element reference queue reader interface includes thefollowing methods: TABLE 25 interfaceIGrooveMultiReaderElementReferenceQueueReader : IGrooveElementUtilBaseDequeue (long i_TimeoutMilliseconds, Dequeues the next available elementIGrooveElementReference ** reference in the queue. Returns onlyo_ppElementReference); when an element is available or after the timeoutperiod. The returned IGrooveElementReference pointer will be NULL if thetimeout period expires. DequeueEnum (long Dequeues all available elementreferences i_TimeoutMilliseconds, in the queue. Returns only when anIGrooveElementReferenceEnum ** element is available or after the timeouto_ppElementReferences); period. The returned IGrooveElementReferencepointer will be NULL if the timeout period expires. OpenEvent(IGrooveEvent ** o_ppEvent); Returns an event that can be used to ‘Wait’for an element to be enqueued

Table 26 illustrates an interface 1304 (IGrooveRPCClient) for a clientof a storage manager that needs to perform remote procedure calls (RPCS)on elements within XML documents. RPC clients are a sub-class of the“util” base class, that is, all of the methods forIGrooveElementUtilBase also apply to IGrooveRPCClient. The storagemanager RPC client interface includes the following methods: TABLE 26interface IGrooveElementRPCClient : IGrooveElementUtilBase DoCall(IGrooveElement * Make a RPC, using the Input element as the i_pInput,input parameters and receiving output IGrooveElement ** parameters inthe Output element. o_ppOutput); SendCall Make an asynchronous RPC,using the Input (IGrooveElement * element as the input parameters.i_pInput); OpenResponseQueue Returns the queue where responses are(IGrooveElementQueue ** received. o_ppQueue);

An interface 1308 (IGrooveRPCServerThread) for a client of a storagemanager that needs to handle remote procedure calls (RPCs) on elementswithin XML documents is shown in Table 27. RPC server threads are asub-class of the “util” base class, that is, all of the methods forIGrooveElementUtilBase also apply to IGrooveRPCServerThread. The storagemanager RPC server callback interface has no methods of its own, onlythose inherited from IGrooveElementUtilBase. It is provided as adistinct interface for type checking. TABLE 27 interfaceIGrooveElementRPCServerThread : IGrooveElementUtilBase (none)

Table 28 illustrates an interface 1312 (IGrooveRPCServer) for a clientof a storage manager that needs to handle remote procedure calls (RPCs)on elements within XML documents. RPC servers are a sub-class of the“util” base class, that is, all of the methods forIGrooveElementUtilBase also apply to IGrooveRPCServer. The storagemanager RPC server interface includes the following methods: TABLE 28interface IGrooveElementRPCServer : IGrooveElementUtilBase OpenCallQueueReturns the queue where calls are (IGrooveElementQueue ** received.o_ppQueue); SendResponse (IGrooveElement * Sends a response to thecaller, i_pInput, IGrooveElement * returning output parameters in thei_pOutput, Output element. VARIANT_BOOL * o_bResult);

The following tables illustrate allowed values for the enumerated datatypes listed in the above interfaces. In particular, Table 29,illustrates allowed values for the GrooveSerializeType enumerated datatype. TABLE 29 GrooveSerializeType GrooveSerializeAuto On input, Groovewill determine the correct format by examining the first few bytes ofthe input stream. On output, Groove will select a format based on thekind of document or element data. GrooveSerializeMIME Format is MHTML,as defined in RFC 2557. GrooveSerializeXML Format is XML. Note thatbinary documents are not supported with this format, but it may be abody type in MHTML. GrooveSerializeWBXML Format is WBXML. Note thatbinary documents are not supported with this format, but it may be abody type in MHTML.

Table 30 illustrates the allowed values for the GrooveSerializeOptionsenumerated data type. TABLE 30 GrooveSerializeOptionsGrooveSerializeDefault Use default serialization options.GrooveSerializeWithFormatting Indent, with blanks, each level of childcontent elements beneath the parent element. GrooveSerializeSortedAttrsOutput the attributes for each element in order of ascending attributename. GrooveSerializeNoFragmentWrapper Output without the fragmentwrapper for document fragments (elements).GrooveSerializeNoNamespaceContraction Output with fully expanded elementand attribute names. GrooveSerializeNoProlog Output without the XMLdocument prolog. GrooveSerializeNoLinks Output without linked documents.GrooveSerializeNotMinimum Don't spend as much local processor time asneeded to ensure the resulting output is the minimum size.

Table 31 illustrates the allowed values for the GrooveParseOptionsenumerated data type. TABLE 31 GrooveParseOptions GrooveParseDefault Usedefault parse options. GrooveParseStripContentWhitespace Remove allextraneous whitespace from element content. GrooveParseNoFragment Parsea fragment that doesn't have a fragment wrapper.GrooveParseNoNamespaceExpansion Parse the document, but don't expandnamespaces to their fully qualified form. GrooveParseNoLinks Parse adocument and skip the links.

Table 32 illustrates the allowed values for the GrooveContentTypeenumerated data type. TABLE 32 GrooveContentType GrooveContentElementContent is a child element. GrooveContentText Content is body text.GrooveContentCDATASection Content is a CDATA section.GrooveContentProcessingInstruction Content is a processing instruction.GrooveContentComment Content is a comment.

Table 33 illustrates the allowed values for the GrooveXLinkShowenumerated data type. TABLE 33 GrooveXLinkShow GrooveXLinkShowNew New.GrooveXLinkShowParsed Parsed. GrooveXLinkShowReplace Replace

Table 34 illustrates the allowed values for the GrooveXLinkActuateenumerated data type: TABLE 34 GrooveXLinkActuate GrooveXLinkActuateUserUser. GrooveXLinkActuateAuto Auto.

Table 35 illustrates the allowed values for the GrooveXLinkSerializeenumerated data type. TABLE 35 GrooveXLinkSerializeGrooveXLinkSerializeByValue By value. GrooveXLinkSerializeByReference Byreference. GrooveXLinkSerializeIgnore Ignore.

Table 36 illustrates the allowed values for theGrooveMultiReaderQueueOptions enumerated data type. TABLE 36GrooveMultiReaderQueueOptions GrooveMRQDefault Use default options.GrooveMRQAllReceive All readers receive each event notification.GrooveMRQEnqueueIfNoReaders Enqueue even if no reader is currentlyqueued to receive the element.

The fundamental data model of the storage manager is XML. XML is asemi-structured, hierarchical, hyper-linked data model. Many real worldproblems are not well represented with such complex structures and arebetter represented in tabular form. For example, spreadsheets andrelational databases provide simple, tabular interfaces. In accordancewith one aspect of the invention, in order to simplify therepresentation, XML structures are mapped to a tabular display,generally called a “waffle”. The waffle represents a collection of data.This mapping is performed by the collection manager, a component of thestorage manager.

Collections are defined by a collection descriptor, which is an XMLdocument type description. Like a document schema, the collectiondescriptor is a special kind of document that is stored apart from thecollection data itself. There are many sources of collection data, butthe primary source of collection data is a software routine called arecord set engine. Driven by user commands, the record set enginepropagates a set of updates for a collection to the collection manager.Based on those updates, the collection manager updates index structuresand may notify waffle users via the notification system. When a waffleuser needs updated or new collection data, the waffle user will call thecollection manager to return a new result array containing the updateddata. The waffle user may also navigate within the collection usingcursors.

The following list shows the XML DTD contents for a collectiondescriptor document: <!ELEMENT Collection ANY> <!ATTLIST Collection  Name CDATA #REQUIRED   Start (record|index) “record”   #REQUIRED  Version CDATA #REQUIRED   Location CDATA #IMPLIED > <!ELEMENT Level(Column|Sorting|Level)*> <!ATTLIST Level   Mapping (Flatten|Direct)  Links (Embed|Traverse) “Traverse” > <!ELEMENT Column EMPTY> <!ATTLISTColumn   Source CDATA #REQUIRED   Output CDATA #REQUIRED   MultiValue(OnlyFirst|MultiLine|Concatenate)“   OnlyFirst”   MultiValueSeparatorCDATA #IMPLIED “,” > <!ELEMENT Sorting SortDescription+> <!ELEMENTSortDescription Group?|SortColumn+|Interval?> <!ATTLIST SortDescription  Name CDATA #REQUIRED > <!ELEMENT SortColumn EMPTY> <!ATTLISTSortColumn   Source CDATA #REQUIRED   Order (Ascending|Descending)#REQUIRED   DataType CDATA #REQUIRED   Strength(Primary|Secondary|Tertiary|Identical) “Identical”   Decomposition(None|Canonical|Full) “None” > <!ELEMENT Group Group?|GroupColumn+><!ATTLIST Group   Grouping (Unique|Units) #REQUIRED   GroupUnits(Years|Months|Days|Hours)   AtGroupBreak (None|Count|Total) “None”  Order (Ascending|Descending) #REQUIRED   Strength(Primary|Secondary|Tertiary|Identical) “Identical”   Decomposition(None|Canonical|Full) “None” > <!ELEMENT GroupColumn EMPTY> <!ATTLISTGroupColumn   Source CDATA #REQUIRED > <!ELEMENT Interval EMPTY><!ATTLIST Interval   Start CDATA #REQUIRED   End CDATA #REQUIRED >

Every Collection has a name that is used to reference the collection.The Start attribute specifies how to find the “root” of the collection.A collection with a record root is just a set of records, whereas acollection that starts with an index is navigated through the index andthen the set of records. An index may be a concordance or full-text. Theoptional Location attribute is a relative URL that identifies where inthe root to actually begin.

A Level defines the contents of part of the output hierarchy. A levelconsists of the columns in the level, the ordering or grouping ofrecords in the level, and definitions of sub-levels. A level isassociated with records in the source record stream through the Mappingattribute. If the mapping is Direct, a level represents a single sourcerecord type. If the mapping is Flatten, the level contains a sourcerecord type and all descendants of that record. The Flatten mapping mayonly be specified on the only or lowest level in the collection. TheLinks attribute specifies how records with link attributes shouldhandled. If links are Traversed, the record will be output as a distinctlevel. If links are Embedded, the child record of the source record willappear as though it is part of the source record.

A Column defines the mapping between a source field and the output arraycolumn. The Source attribute is a XSLT path expression in the sourcerecords. The Result attribute is a name of the field in the resultarray. The MultiValue and MultiValueSeparator attributes define howmulti-valued source values are returned in the result.

Every collection must have at least one defined order. The order can besorted collation or multi-level grouping with aggregate functions.

The SortColumn element defines the collation characteristics within aSortDescription. The Source attribute defines the name of the outputcolumn to be sorted. The Order must be either Ascending or Descending.The Strength and Decomposition values are input parameters that have thesame meaning as defined in Unicode.

The two kinds of grouping are by unique values and by units. When acollection is grouped by unique values, all records with the sameGroupColumn values will be together in the same group—breaks betweengroups will occur at the change of GroupColumn values. When a collectionis grouped by units, all records with the same GroupColumn values,resolved to the value of GroupUnits, will be together in the same group.For example, if GroupUnits is “Days”, all records for a given day willbe in the same group. If AtGroupBreak is specified, a synthetic row willbe returned that contains the result of the aggregate function at eachvalue or unit break value.

The GroupColumn identifies the result column to be grouped.

The Interval identifies the two fields in each record that define arange. The datatypes of the Start and End columns must be either numericor datetime.

The following example shows a collection descriptor document for asimple document discussion record view with six collation orders:<Collection Name=“Main” Start=“Record” Version=“0,1,0,0”>  <LevelMapping=“Flatten”>   <Column Source=“Title” Output=“Title”/>   <ColumnSource=“_Modified” Output=“_Modified”/>   <Column Source=“_CreatedBy”Output=“_CreatedBy”/>   <Sorting>    <SortDescriptionName=“ByAscModified”>     <SortColumn Source=“_Modified”Order=“Ascending”      DataType=“DateTime”/>    </SortDescription>   <SortDescription Name=“ByDescModified”>     <SortColumnSource=“_Modified”      Order=“Descending” DataType=“DateTime”/>   </SortDescription>    <SortDescription Name=“ByAscAuthor”>    <SortColumn Source=“_CreatedBy”      Order=“Ascending”DataType=“String”/>    </SortDescription>    <SortDescriptionName=“ByDescAuthor”>     <SortColumn Source=“_CreatedBy”     Order=“Descending” DataType=“String”/>    </SortDescription>   <SortDescription Name=“ByAscTitle”>     <SortColumn Source=“Title”Order=“Ascending”      DataType=“String”/>    </SortDescription>   <SortDescription Name=“ByOrdinal”>     <SortColumn Source=“”Order=“Ordinal”      DataType=“Long”/>    </SortDescription>  </Sorting>  </Level> </Collection>

The following example shows a collection descriptor for a calendar view.Note the similarity to the prior example, but with a small change to thesort description, the collection is ordered by ranges of date intervals.<Collection Name=“Main” Start=“Record” Version=“0,1,0,0”>  <LevelMapping=“Flatten”>   <Column Source=“from-attributes(Subject)”   Output=“Subject”/>   <Column Source=“from-attributes(Start)”   Output=“Start”/>   <Column Source=“from-attributes(End)”   Output=“End”/>   <Column Source=“from-attributes(RecurrenceEnd)”   Output=“RecurrenceEnd”/>   <Column Source=“from-attributes(IsAllDay)”   Output=“IsAllDay”/>   <Column Source=“from-attributes(IsRecurrent)”   Output=“IsRecurrent”/>   <Sorting>  <SortDescriptionName=“DateRanges”>   <Interval Start=“Start” End=“End”/> </SortDescription>   </Sorting>  </Level> </Collection>

As is the basic storage manager, the collection manager is implementedin an object-oriented environment. Accordingly, both the collectionmanager itself and all of the collection components includingcollections, waffles, cursors, result arrays and the record set engineare implemented as objects. These objects, their interface, theunderlying structure and the API used to interface with the collectionmanager are illustrated in FIG. 14. The API is described in more detailin connection with FIG. 15. Referring to FIG. 14, the collection managerprovides shared access to collections, via the collection manipulationAPI 1402, but, in order to enable a full programming model for clientapplications, additional communication and synchronization operationsare provided, within the context of a collection. For example, a usercan control a record set engine 1412 by means of the engine API 1404.Under control of commands in the engine API 1404, the record set engine1412 propagates a set of updates for a collection to the distributedvirtual object system 1410 that is discussed above. Based on thoseupdates, the distributed virtual object system 1410 updates index andother structures.

Other client components may need to be aware of changes withincomponents, such as waffles, managed by the collection manager.Accordingly, the collection manager provides an interface 1400 to aninterest-based notification system 1406 for those client components. Thenotification system 1406 provides notifications to client componentlisteners who have registered an interest when values within objects1408 that represent a collection change;

Collection data is represented by a set of objects including collectionobjects, record objects, waffle objects, cursor objects and result arrayobjects 1408. The objects can be directly manipulated by means of thecollection manipulation API 1402. The collection related objects 1408are actually implemented by the distributed virtual object system 1410that was discussed in detail above.

FIG. 15 and the following tables comprise a description of theinterfaces for each of the objects used to implement a preferredembodiment of the inventive collection manager. As with the storagemanager implementation, these objects are designed in accordance withthe Common Object Model (COM), but could also be implemented using otherstyles of interface and object model.

Table 37 illustrates an interface 1500 (IGrooveCollectionManager) for acollection manager that encapsulates the basic framework for the majoroperations performed on a collection. The collection manager interfaceincludes the following methods: TABLE 37 InterfaceIGrooveCollectionManager : IGrooveDispatchCreateCollection(IGrooveElement Creates a new collection object. The*i_pCollectionDescriptor, BSTR CollectionDescriptor should contain ai_CollectionURL, BSTR i_EngineID, collection descriptor in XML accordingto the IGrooveCollection **o_ppCollection); GrooveCollection XML DTD.DeleteCollection(IGrooveXMLDocument Deletes the specified collectionfrom the *i_pSourceDocument, BSTR SourceDocument. i_CollectionURL);OpenCollection(IGrooveElement Opens an existing collection object.*i_pCollectionDescriptor, BSTR i_CollectionURL, BSTR i_EngineID,IGrooveCollection **o_ppCollection);OpenCollectionEnum(IGrooveXMLDocument * Return an enumeration of allcollections within i_pSourceDocument, a document. IGrooveBSTREnum**o_ppCollectionNames); ParseCollectionDescriptor(IGrooveElement *Creates a collection document according to i_pCollectionElement, void *the specified collection descriptor. m_Levels); UpdateCollection(void*i_Updates, Perform the requested sequence of BSTR i_EngineID,IGrooveElement ** operations (of kind o_ppUpdateContext);GrooveCollectionUpdateOp) on the collection for EngineID.

Table 38 illustrates an interface 1502 (IGrooveCollection) for acollection that encapsulates the basic framework for the majoroperations performed on a collection. The collection interface includesthe following methods: TABLE 38 Interface IGrooveCollection :IGrooveDispatch AdviseListeners(IGrooveElement Notifies subscribinglisteners of changes to this *i_UpdateContext); element.CloseWaffle(IGrooveWaffle Removes an IGrooveWaffle instance from thelist *i_pWaffle); of the collection's listeners. Delete(void); Deletesthe collection from the database. DisableListeners (void); Disablesevent notifications for all subscribing listeners. EnableListeners(void); Enables event notifications for all subscribing listeners. Eventnotifications are enabled by default, so this is only necessary ifDisableListeners was previously called. Find(BSTR i_pQuery, Using thespecified XSLT query expression, IGrooveCollection ** evaluate it on thecollection and return a new o_ppQueryResult); collection as the result.XSLT locators have the form: AxisIdentifier(Node Test Predicate)  whereAxisIdentifier is one of:   from-ancestors   from-ancestors-or-self  from-attributes   from-children   from-descendants  from-descendants-or-self   from-following   from-following-siblings  from-parent   from-preceding   from-preceding-siblings   from-self  from-source-link  NodeTest is of the form QName and tests whether the node is an element or attribute with the specified name.  A Predicateis of the form [ PredicateExpr ]  PredicateExpr is a Expr  Expr is oneof:   VariableReference   ( Expr )   Literal   Number   FunctionCall Multiple predicates are separated by “/” For example:from-children(ElementName[from-attributes(Attribute Name)])GetCursor(IGrooveCollectionCursor Returns a copy of the cursor currentlyused by the **o_ppCursor); collection. GetCursorPosition(double *Returns the relative position of the cursor as a o_pRelativePosition);number between 0.0 (first row) and 100.0 (last row).GetEngineMappingTable(void Returns the engine mapping table.**o_ppEngineURLs); GetExpansionMask(long Gets the current value of theexpansion mask. *o_pMask); GetRecordCount(long * Returns the number ofrecords in the collection. o_pRecordCount); HasOrdinalSort(BSTR * If thecollection has an ordinal index, returns the o_pSortName, VARIANT_BOOLsort name and the value TRUE, otherwise it *o_pHaveSort); returns FALSE.HasSort(BSTR i_ColumnName, Returns a bool indicating whether or not asort GrooveCollationOrder exists in the collection for the columnspecified by i_CollationOrder, long i_Level, i_ColumnName on leveli_Level in collation order BSTR *o_pSortName, i_AscendingSort. If a sortexists the sort name is VARIANT_BOOL *o_pHaveSort); returned ino_pSortName. IsEmpty(VARIANT_BOOL Returns a bool indicating whether ornot the *o_pIsEmpty); collection is empty. MarkAll(VARIANT_BOOL i_Read);Sets the record read/unread indicator for all records in the collectionto be the value of Read. MarkRead(double i_RecordID); Sets a specificrecord to be marked as read. MarkUnread(double i_RecordID); Sets aspecific record to be marked as unread.MoveCursor(GrooveCollectionCursor Every collection has a cursor. Thecursor Position i_AbsolutePosition, establishes the starting position inthe source GrooveCollectionNavigationOp document, which will then beused to build the i_Navigator, long i_Distance, long result document.*o_pDistanceMoved); AbsolutePosition may have the values First, Last, orCurrent. Navigator may have the following values: Value DescriptionNextAny, PriorAny Move the cursor to the next/previous source row,traversing down through child rows and up through parent rows. NextPeer,PriorPeer Move the cursor to the next/previous source row at the samelevel, stopping if a row at a higher level is reached. NextParent,PriorParent Move the cursor to the next/previous parent source row,traversing until the root row is reached. NextData, PriorData Move thecursor to the next/previous row that contains a data record. NextUnread,PriorUnread Move the cursor to the next/previous unread row. Distancesets the numbers of iterations to move the cursor, starting atAbsolutePosition and moving through Distance iterations of Navigatormovement. MoveCursor returns the number of iterations the cursor wasactually moved. MoveCursorToRecord(double Sets the collection's cursorto point to the i_RecordID); specified record. MoveCursorToValue(BSTRUsing the current sort order, positions the cursor i_pQuery, double *o_pRecordID); to the row that meets the criteria of matching the relopto the input query values. The relop (relational operator) may be EQ,LT, LE, GT, or GE. The query values must match, in order, the datatypesof the columns of the current sort order or must be able to be convertedin a loss-less manner to those datatypes. Fewer query values may bespecified than are defined in the sort order, which will result in apartial match. For collections ordered on an interval, the first queryvalue is the interval's starting value and the second is the endingvalue. MoveToCursor(IGrooveCollection Moves the collection to theposition specified by Cursor *i_pCursor); i_pCursor. Open(BSTRi_CollectionURL, Creates or opens the collection specified byIGrooveElement I_CollectionURL within the Groove storage service*i_pCollectionDescriptorElement, i_ServiceType. Returns a boolindicating whether VARIANT_BOOL i_Temp, or not the collection wascreated for the first time. VARIANT_BOOL i_Shared, VARIANT_BOOL *o_pCreated); OpenRecord(double i_RecordID, Returns an interface pointerto a specific record in IGrooveRecord ** o_ppRecord); the collection.OpenRecordID(double Starting from the position of the SourceRecordID,i_SourceRecordID, enum perform the specified collection navigationGrooveCollectionNavigationOp operation and return the resulting recordID. i_Relation, double * o_pTargetRecordID); OpenResultArray(long Giventhe collection's expansion mask, current i_NumReturnRows, void cursorposition and current sort order, return at *io_pResultArray); mostNumReturnRows into a result array conforming to the description below.Note that NumReturnRows is a quota only on the data rows — othersynthesized header and footer rows may be returned as necessary.    Column Name      Data Type      Description RowType UINT1==WAFFLE_ROW_DATA if the row is a data record returned from an engine,==WAFFLE_ROW_HEADER false if the row is a synthesized header (e.g.,category), ==WAFFLE_ROW_FOOTER if the row is a synthesized footer (e.g.,aggregate result). SynthKind UINT1 If the row is a data row, this valueis 0. If the row is a synthesized row, this value will be one of:BreakUnique: Indicates a change in value of categorized or sortedcolumn. One of the ColumnName(i) columns will have the new value.BreakUnitDay BreakUnitWeek BreakUnitMonth BreakUnitYear FuncTotalFuncCount EngineID UINT4 If the row is a data row: Index into theEngineID table, which is a vector of URLs stored as BSTRs. If the row isa synthesized row, EngineID is 0. RecordID UINT4 If the row is a datarow: RecordID returned from the engine identified by EngineID. RecordIDsare unique within EngineIDs. If the row is a synthesized row: RecordIDis a unique number within the collection. Level UINT1 Number of levelsto indent this row. Level 0 is the top or outermost level.RelativePosition UINT2 A number between 0 and 10000 indicating therelative offset of this row from the beginning of the collection. [Itmay be an approximation.] For example, 6823 is the value for a row thatis 68.23% of the way through the collection. Read BOOL If the row is adata row: True if the [account??] has read the record. If the row is asynthesized row, Read is always true (even if it is collapsed).ColumnName(i) Defined by the collection descriptor. Data value for thisrow/column. There will be as many columns in the array as there weredefined columns at all levels. OpenSchema(long i_Level, Return aninterface pointer to the schema VARIANT_BOOL description for the recordsin the collection. i_IncludeSystemColumns, IGrooveRecordSchema**o_ppCollectionSchema); OpenTransaction(IGrooveTransaction Creates atransaction on the collection document. **o_ppTransaction);OpenWaffle(IGrooveWaffleListener Creates an IGrooveWaffle instance andadds it to *i_pListener, IGrooveWaffle the collections list of eventlisteners. **o_ppWaffle); SetCursorPosition(double Sets the currentposition of the cursor to the row i_RelativePosition); with thespecified relative position. The position should be a number between 0.0(first row) and 100.0 (last row). SetExpansionMask(long i_Mask); Setsthe current value of the expansion mask. The mask is a stored in aDWORD, but only the first 10 (or so) bits are used. If a bit is set, alldata the indicated level is expanded. The expansion mask is notpersistent or shared —its effect is only on this collection object. Thedefault value of the expansion mask is all 1s. SetRecordExpansion(doubleSets the expansion state for a single row for this i_RecordID,VARIANT_BOOL scope. If Expand is true, the record will be i_Expand);expanded, otherwise it will be collapsed. If EngineID is 0, then allrows encompassed by specified synthesized RecordID will be eitherexpanded or collapsed. Update(BSTR i_EngineURL, Updates the collection.i_Operation is one of: GrooveCollectionUpdateOp OP_ADD, OP_DELETE, orOP_UPDATE. i_Operation, void * i_pUpdateRecord, IGrooveElement *io_pUpdateContext); UseSort(BSTR i_SortName, Sets the sort order for thecollection to the named VARIANT_BOOL sort order. The specified SortNamemust be one i_RetainCursorPosition); of the defined sort orders in thecollection descriptor. If i_RetainCursorPosition is true and the currentcursor position identifies a data record, the current collection'scursor is positioned to the same record in the new sort order.Otherwise, the cursor position is positioned to the first row in the newsort order.

Table 39 illustrates an interface 1504 (IGrooveCollectionListener) for aclient of a collection manager that wishes to be notified whenever“significant” events happen within the collection. Significant eventsmay occur at any time and include updating, addition, deletion,reparenting, or a change in ordinal position of a collection element.The collection manager listener interface includes the followingmethods: TABLE 39 interface IGrooveCollectionListener : IGrooveDispatchOnRecordChange(IGrooveElement Called when the data in this *i_pElement);element has been updated or the element has been added, deleted,reparented, or its ordinal position has changed. OnSortChange(void);Called when the sort order for the collection changes.

Table 40 illustrates an interface 1506 (IGrooveCollectionCursor) for aclient of a collection manager that wants to move a cursor within thecollection. A collection may have one or more cursors active at anytime. The collection manager cursor interface includes the followingmethods: TABLE 40 interface IGrooveCollectionCursor : IGrooveDispatchMove Moves the cursor in either an (GrooveCollectionCursorPositionabsolute or relative amount. i_AbsolutePosition, AbsolutePosition mayhave the values GrooveCollectionNavigationOp First, Last, or Current.i_Navigator, long i_Distance, Navigator may have the following longvalues: *o_pDistanceMoved); Value Description NextAny, PriorAny Move thecursor to the next/previous source row, traversing down through childrows and up through parent rows. NextPeer, PriorPeer Move the cursor tothe next/previous source row at the same level, stopping if a row at ahigher level is reached. NextParent, PriorParent Move the cursor to thenext/previous parent source row, traversing until the root row isreached. NextData, PriorData Move the cursor to the next/previous rowthat contains a data record. NextUnread, PriorUnread Move the cursor tothe next/previous unread row. Distance sets the numbers of iterations tomove the cursor, starting at AbsolutePosition and moving throughDistance iterations of Navigator movement. Move returns the number ofiterations the cursor was actually moved. OpenRecord Returns aninterface pointer to the (IGrooveRecord ** record the cursor iscurrently set at. o_ppRecord);

The following tables illustrate allowed values for the enumerated datatypes listed in the above interfaces. In particular, Table 41,illustrates allowed values for the GrooveCollationOrder enumerated datatype: TABLE 41 GrooveCollationOrder CollateAscending Ordered byascending data values. CollateDescending Ordered by descending datavalues. CollateOrdinal Ordered by ordinal position.

Table 42 illustrates the allowed values for theGrooveCollectionNavigationOp enumerated data type: TABLE 42GrooveCollectionNavigationOp NextAny Move the cursor to the next sourcerow, traversing down through child rows and up through parent rows.PriorAny Move the cursor to the previous source row, traversing downthrough child rows and up through parent rows. NextPeer Move the cursorto the next source row at the same level, stopping if a row at a higherlevel is reached. PriorPeer Move the cursor to the previous source rowat the same level, stopping if a row at a higher level is reached.NextParent Move the cursor to the next parent source row, traversinguntil the root row is reached. PriorParent Move the cursor to theprevious parent source row, traversing until the root row is reached.NextData Move the cursor to the next row that contains a data record.PriorData Move the cursor to the previous row that contains a datarecord. NextUnread Move the cursor to the next unread row. PriorUnreadMove the cursor to the next unread row.

Table 43 illustrates the allowed values for theGrooveCollectionCursorPosition enumerated data type: TABLE 43GrooveCollectionCursorPosition First The first row in the collection.Last The last row in the collection. Current The current row in thecollection. This position is useful for performing relative cursormovement.

Table 44 illustrates the allowed values for the GrooveCollectionRowTypeenumerated data type: TABLE 44 GrooveCollectionRowType ROW_DATA A rowwith data values. ROW_HEADER A row header, for example, column breakvalues. ROW_FOOTER A row footer, for example, column break values and anaggregated result.

Table 45 illustrates the allowed values for theGrooveCollectionSynthType enumerated data type: TABLE 45GrooveCollectionSynthType BreakUnique Synthesized collection rowindicates a change in value of categorized or sorted column. One of theother columns will have the new value. BreakUnitDay Synthesizedcollection row is a break on the change in units of days. BreakUnitWeekSynthesized collection row is a break on the change in units of weeks.BreakUnitMonth Synthesized collection row is a break on the change inunits of months. BreakUnitYear Synthesized collection row is a break onthe change in units of years. FuncTotal Synthesized collection row isthe result of an aggregate total function. FuncCount Synthesizedcollection row is the result of an aggregate count function.

Table 46 illustrates the allowed values for the GrooveCollectionUpdateOpenumerated data type: TABLE 46 GrooveCollectionUpdateOp OP_ADD Add therecord to the collection. OP_DELETE Delete the record from thecollection. OP_UPDATE Change values of specific fields in this record,which is already in the collection. OP_REPARENT Change this record'sparent. OP_CHANGE_ORDINAL Change the ordinal position of this record inthe collection.

Table 47 illustrates the allowed values for theGrooveCollectionWaffieSystem enumerated data type: TABLE 47GrooveCollectionWaffleSystemColumns WAFFLE_ROWTYPE_COLUMN One of thevalues for GrooveCollectionRowType. WAFFLE_SYNTHKIND_COLUMN If not adata row, one of the values in GrooveCollectionSynthType.WAFFLE_RECORDID_COLUMN A unique identifier for the record. The RecordIDmust be unique within the collection, but may not be unique in otherscopes. WAFFLE_PARENT_RECORDID_COLUMN A reference to a parent recordthat contains the recordID of a record in the collection. If the recordreference in the parent recordid is deleted, this record will also bedeleted from the collection. WAFFLE_LEVEL_COLUMN The number of indentionlevels from the root level of the hierarchy. The root level is 0.WAFFLE_RELPOS_COLUMN A number between 0.0 (first row) and 100.0 (lastrow). WAFFLE_READ_COLUMN A list of whoever has read this record. If thisfield is not present, no users have read the record.WAFFLE_EXPANDED_COLUMN A boolean indicator for whether the row iscollapsed or fully expanded. WAFFLE_HASCHILDREN_COLUMN A booleanindicator for whether the row has children.

Table 48 illustrates the allowed values for the GrooveCollectionRecordIDenumerated data type: TABLE 48 GrooveCollectionRecordID NULL_RECORD_IDThe reserved value for the special null record id.

Table 49 illustrates the allowed values for the GrooveSortOrderenumerated data type: TABLE 49 GrooveSortOrder Ascending Collate byascending data values Descending Collate by descending data values.

A software implementation of the above-described embodiment may comprisea series of computer instructions either fixed on a tangible medium,such as a computer readable media, e.g. a diskette, a CD-ROM, a ROMmemory, or a fixed disk, or transmissible to a computer system, via amodem or other interface device over a medium. The medium can be eithera tangible medium, including, but not limited to, optical or analogcommunications lines, or may be implemented with wireless techniques,including but not limited to microwave, infrared or other transmissiontechniques. It may also be the Internet. The series of computerinstructions embodies all or part of the functionality previouslydescribed herein with respect to the invention. Those skilled in the artwill appreciate that such computer instructions can be written in anumber of programming languages for use with many computer architecturesor operating systems. Further, such instructions may be stored using anymemory technology, present or future, including, but not limited to,semiconductor, magnetic, optical or other memory devices, or transmittedusing any communications technology, present or future, including butnot limited to optical, infrared, microwave, or other transmissiontechnologies. It is contemplated that such a computer program productmay be distributed as a removable media with accompanying printed orelectronic documentation, e.g., shrink wrapped software, pre-loaded witha computer system, e.g., on system ROM or fixed disk, or distributedfrom a server or electronic bulletin board over a network, e.g., theInternet or World Wide Web.

Although an exemplary embodiment of the invention has been disclosed, itwill be apparent to those skilled in the art that various changes andmodifications can be made which will achieve some of the advantages ofthe invention without departing from the spirit and scope of theinvention. For example, it will be obvious to those reasonably skilledin the art that, although the description was directed to a particularhardware system and operating system, other hardware and operatingsystem software could be used in the same manner as that described.Other aspects, such as the specific instructions utilized to achieve aparticular function, as well as other modifications to the inventiveconcept are intended to be covered by the appended claims.

1. Apparatus for representing and managing an XML-compliant document ina memory, the XML-compliant document being composed of a plurality ofelements arranged in a nested relationship, the apparatus comprising: adata document including a plurality of element objects, each elementobject representing a part of the XML-compliant document; and amechanism for arranging the plurality of element objects in a hierarchyrepresentative of the nested relationship of the elements.
 2. Apparatusas recited in claim 1 wherein at least some of the elements containtextual content and wherein element objects representing the elementscontain the textual content.
 3. Apparatus as recited in claim 1 whereinat least some of the elements contain attributes having values andwherein element objects representing the elements contain the attributevalues.
 4. Apparatus as recited in claim 3 wherein the attribute valuescontained in the at least some elements are typed.
 5. Apparatus asrecited in claim 3 further comprising an attribute index containingconsistent pointers to all element objects containing attribute values.6. Apparatus as recited in claim 1 wherein the arranging mechanismcomprises database pointers and wherein a database pointer in a parentelement object points to child objects of the parent element object inorder to arrange the parent object and child objects in a hierarchicalrelationship.
 7. Apparatus as recited in claim 1 further comprising aschema document referenced by the data document, the schema documentcontaining content that describes the pattern of element objects andattributes, the existence and structure of document indicies, andcommonly used strings in the data document.
 8. Apparatus as recited inclaim 7 wherein the schema document is referenced by an XML processingstatement in the data document.
 9. Apparatus as recited in claim 1further comprising a binary document object for representing a datadocument containing binary data.
 10. Apparatus as recited in claim 1further comprising a document object for representing the data document.11. Apparatus as recited in claim 10 wherein the document objectcontains links to other document objects so that the other documentobjects are sub-documents of the document object.
 12. Apparatus asrecited in claim 1 wherein each of the element objects exports a uniforminterface containing methods for manipulating each of the elementobjects.
 13. Apparatus for representing and managing an XML-compliantdocument in a memory, the XML-compliant document being composed of aplurality of elements, the elements being arranged in a nestedrelationship, the apparatus comprising: a data document including aplurality of element objects, each element object representing a part ofthe XML-compliant document; and a collection manager that maps theelement objects into a tabular data structure including an indexstructure.
 14. Apparatus as recited in claim 13 further comprising arecord set engine that is responsive to user commands for propagating aset of updates for the tabular data structure to the collection manager.15. Apparatus as recited in claim 14 wherein the collection managerfurther comprises an update mechanism which responds the set of updatesby updating the index structure.
 16. Apparatus as recited in claim 15wherein the collection manager further comprises a notification systemthat notifies the users when changes are made to the tabular datastructure.
 17. Apparatus as recited in claim 16 wherein the collectionmanager further comprises a navigation mechanism for creating a cursorto allow the users to navigate within the tabular data structure.
 18. Amethod for representing and managing an XML-compliant document in amemory, the XML-compliant document being composed of a plurality ofelements arranged in a nested relationship, the method comprising: (a)creating a data document in the memory including a plurality of elementobjects, each element object representing a part of the XML-compliantdocument; and (b) arranging the plurality of element objects in ahierarchy representative of the nested relationship of the elements. 19.A method as recited in claim 18 wherein at least some of the elementscontain textual content and wherein element objects representing theelements contain the textual content.
 20. A method as recited in claim18 wherein at least some of the elements contain attributes havingvalues and wherein element objects representing the elements contain theattribute values.
 21. A method as recited in claim 20 wherein theattribute values contained in the at least some elements are typed. 22.A method as recited in claim 20 further comprising an attribute indexcontaining consistent pointers to all element objects containingattribute values.
 23. A method as recited in claim 18 wherein step (b)comprises creating a database pointer in a parent element object whichpointer points to child objects of the parent element object in order toarrange the parent object and child objects in a hierarchicalrelationship.
 24. A method as recited in claim 18 further comprising (c)creating a schema document referenced by the data document in thememory, the schema document containing content that describes thepattern of element objects and attributes, the existence and structureof document indicies, and commonly used strings in the data document.25. A method as recited in claim 24 wherein step (c) comprises creatingthe schema document referenced by an XML processing statement in thedata document.
 26. A method as recited in claim 18 further comprising(d) creating a binary document object in the memory for representing adata document containing binary data.
 27. A method as recited in claim18 further comprising (e) creating a document object in the memory forrepresenting the data document.
 28. A method as recited in claim 27wherein the document object contains links to other document objects sothat the other document objects are sub-documents of the documentobject.
 29. A method as recited in claim 18 wherein each of the elementobjects exports a uniform interface containing methods for manipulatingeach of the element objects.
 30. A method for representing and managingan XML-compliant document in a memory, the XML-compliant document beingcomposed of a plurality of elements, the elements being arranged in anested relationship, the method comprising: (a) creating a data documentin the memory including a plurality of element objects, each elementobject representing a part of the XML-compliant document; and (b)mapping the element objects into a tabular data structure including anindex structure.
 31. A method as recited in claim 30 further comprising(c) propagating a set of updates for the tabular data structure to thecollection manager in response to user commands.
 32. A method as recitedin claim 31 further comprising (d) updating the index structure inresponse to set of updates.
 33. A method as recited in claim 32 furthercomprising (e) notifying the users when changes are made to the tabulardata structure.
 34. A method as recited in claim 33 further comprising(f) creating a cursor to allow the users to navigate within the tabulardata structure.
 35. A computer program product for representing andmanaging an XML-compliant document in a memory, the XML-compliantdocument being composed of a plurality of elements arranged in a nestedrelationship, the computer program product comprising a computer usablemedium having computer readable program code thereon, including: programcode for creating a data document in the memory including a plurality ofelement objects, each element object representing a part of theXML-compliant document; and program code for arranging the plurality ofelement objects in a hierarchy representative of the nested relationshipof the elements.
 36. A computer program product for representing andmanaging an XML-compliant document in a memory, the XML-compliantdocument being composed of a plurality of elements, the elements beingarranged in a nested relationship, the computer program productcomprising a computer usable medium having computer readable programcode thereon, including: program code for creating a data document inthe memory including a plurality of element objects, each element objectrepresenting a part of the XML-compliant document; and program code formapping the element objects into a tabular data structure including anindex structure.
 37. A computer data signal embodied in a carrier wavefor representing and managing an XML-compliant document in a memory, theXML-compliant document being composed of a plurality of elementsarranged in a nested relationship, the computer data signal comprising:program code for creating a data document in the memory including aplurality of element objects, each element object representing a part ofthe XML-compliant document; and program code for arranging the pluralityof element objects in a hierarchy representative of the nestedrelationship of the elements.
 38. A computer data signal forrepresenting and managing an XML-compliant document in a memory, theXML-compliant document being composed of a plurality of elements, theelements being arranged in a nested relationship, the computer datasignal comprising: program code for creating a data document in thememory including a plurality of element objects, each element objectrepresenting a part of the XML-compliant document; and program code formapping the element objects into a tabular data structure including anindex structure.